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(54) Information processing apparatus for prefetching data structure either from a main memory 
or its cache memory 



(57) To improve the function of a circuit for prefetch- 
ing data accessed by a processor, a prefetch unit H05) 
incorporates therein a circuit for issuing a request to 
read out one group of data to be prefetched and regis- 
ters for holding the group of data read in response to the 
read request therein. The group of data are read out 
from a cache memory dOOl ) or a mam memory (1) un- 
der the control of a cache request unit (101). A plurality 
of groups of data can be prefetched. When data desig- 
nation is made, the processor (2) requests the cache 
memory (1001) to read a block to which the data to be 
prefetched belongs. A circuit is also included in the 
prefetch unit (105). wherein when prefetched data is 
subsequently updated by the processor, its updated da- 
ta is made invalid. Elements of a vector complex in struc- 
ture, such as an indexed vector or the like can be also 
read out. It is also possible to cope with an interrupt gen- 
erated within the processor i2). 



FIG. 1 
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Description 

BACKGROUND OF THE INVENTION 

The present invention relates to an information 
processing apparatus for prefetching a group of data 
having specific structures, such as an array data. etc. 
from a memory such as a mam memory or the tike. 

A conventional high-performance information 
processing apparatus normally includes an instruction 
processing device, a cache and a memory. The infor- 
mation processing apparatus causes the memory to 
stores a program and data therein and processes the 
data in the memory in accordance with each instruction 
described in the program. The cache is of a memory 
means short in time accessed by the instruction 
processing device and relatively small in capacity, which 
temporarily stores some of the programs and data there- 
in. Data necessary for the execution of the instruction 
are read from the memory. However, a data block in- * 
eluding the data is simultaneously copied onto a line 
forming the cache. When an access or reference to the 
data in the block is designatea subsequently, the iine of 
the cache accesses or refers to the data. The transfer 
of the data block from the memory to the cache line is 
called block or line transfer. When no necessary data 
exists in the cache upon instruction execution, this is 
called cache miss. When the cache miss occurs, the line 
transfer is executed. In the conventional information 
processing apparatus, when the line transfer occurs 
with the instruction execution, the execution of the in- 
struction is piaced in a waiting state until the line transfer 
is completed. Thus, when the cache miss frequently oc- 
curs, a problem arose that the time required to execute 
the program increases due to the queuing incident to 
the line transfer and the information processing appara- 
tus is reduced in processing performance. Conditions 
became serious about technical calculations for han- 
dling large-scale data and database processing in par- 
ticular 

On the other hand, an attempt to avoid a decrease 
in performance incident to the line transfer by executing 
a special instruction for designating look-ahead to data 
prior to an instruction having a possibility of a cache miss 
being developed has been recently made to a program 
in advance. The data look-ahead is also called data 
prefetch or simply called prefetch. A memory access in- 
struction thereinafter called load GRO type prefetch in- 
struction) to a general purpose register 0 number is de- 
fined as one means for realizing the prefetch in a micro- 
processor PA7100 of U.S. Hewlett-Packard Company 
(hereinafter may be called first prior art). The load GRO 
type prefetch instruction reads out data located in a des- 
ignated operand address but abandons the result of 
reading after its readout. When a cache miss occurs dur- 
ing this operation, the line transfer is executed without 
waiting for the corresponding instruction. Thus, since 
the data is held in the cache when the instruction for 



accessing the cata is executed later on the degradation 
in performance can be avoided. 

When, however, the prefetch instruction is used, 
four problems to be described below arise. The first 
proolem is that when: for example, a program having a 
loop structure, which successively refer to or accesses 
vector data, is executed using the prefetch, it is neces- 
sary to execute two instructions comprising a memory 
access instruction for loading data into a register storing 
an operand therein and the prefetch instruction with re- 
spect to one vector element, thereby causing an in- 
crease in instruction processing time correspondingly. 
Assuming now that codes obtained by adding the 
prefetch instruction to a program at which large amounts 
of aata are heid in a cache, are executed, there is the 
potential that an instruction processing time increases 
as compared with a program no added with the prefetch 
instruction ana the performance is degraded on the con- 
trary. 

The second problem is that when the execution of 
the program is piaced in a waiting state due to some 
causes since the prefetch instruction is described in the 
program, the execution of the prefetch instruction, itself 
is also Dlaceo ;n a waiting state, thereby reducing an 
effect Drought about by prefetching that data reading is 
started as earlier as possible. In order to avoid this, the 
prefetcn instruction is required to be issued at the time 
so earner than that at its corresponding memory access 
instruction. However, this will cause a problem that the 
30 corresponding program is rendered complex in struc- 
ture and the size of the program is increased. 

The third problem is that since large-scale vector 
data are successively loaded or captured into a cache 
under to the load GRO type prefetch instruction, other 
35 data already stored in the cache are expelled therefrom, 
thereby causing the potential that a cache miss increas- 
es ana consequently the performance of the information 
processing apparatus is degraded. 

The fourth problem is that since a data access is 
-o performed in line units of the cache under the load GRO 
type prefetch, even non-accessed data are read where 
the data access is applied to the access of non-contig- 
uous vector data, thereby causing a reduction in per- 
formance. 

-5 As other prior art. there has been proposed a tech- 
nique for providing a prefetch unit which is initially set 
by a processor and subsequently reads out data in asyn- 
chronism with the processor. Refer to. for example. Na- 
kazato et al. "Architecture and evaluation of OCHANO- 
so MIZ-1". Research Report by Information Processing In- 
stitute of Japan. Computer Architecture. No. 101-8. pp. 
57-64. Aug. 20. 1993. The technique described in the 
literature may be called second prior art. A technique for 
prefetching array data by every processor in a system 
55 having a plurality of processors has been disclosed in 
the prior art. Namely, a technique is disclosed wherein 
prefetch controllers for the processors prefetch a plural- 
ity of elements in array data in accordance with address- 
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es. strides, lengths of the array data to be prefetched, 
which are designated by the respective processors and 
the elements are respectively stored in prefetch buffers 
provided so as to correspond to the processors. Another 
report on the same machine is discribed in Totsuka et 
al. 'General purpose fine-grained parallel processor. 
OCHANOMIZU-1 -Architecture and Performance Eval- 
uation-", in Proc. of Parallel processing symposium 
JSPP "94. pp. 70-S3. May 1994 by Information process- 
ing Institute of Japan. 

SUMMARY OF THE INVENTION 

According to the second prior art. the problems de- 
scribed in connection with the first prior art are reduced. 
However, the following problems arise upon realization 
of the second prior art. 

In the second prior art. a cache memory tor holding 
data stored in a main storage is disclosed so as to cor- 
respond to each processor. However. no description will 20 
be made of how to use the cache memory upon 
prefetching data by the prefetch unit. 

Further, in the second prior art. a technique for 
prefetching a plurality of data having addresses sepa- 
rated from one another at predetermined address inter- -S 
vals as in the case of the plurality of data in the above 
array, is simple disclosed. However, data, like elements 
of an indexed vector, whose addresses are determined 
by other group of data, also exist within a group of data 
processed by a program. ^° 

Other various problems to be solved are involved in 
the second prior art to put the second prior art into prac- 
tical use. 

An object of the present invention is to solve the 
problems of the second prior art and provide an infor- ^ 
mation processing apparatus having a prefetcn circuit 
with a higher function. 

A more specific object of the present invention is to 
provide an information processing apparatus capable of 
speeding up prefetching by skillfully utilizing a cache ~o 
memory in a memory such as a mam memory or the like. 

An another object of the present invention is to pro- 
vide an information processing apparatus capable of 
prefetching a plurality of groups of data. 

A further object of the present invention is to provide -5 
an information processing apparatus capable of 
prefetching a group of data forming an indexed vector. 

A still further object of the present invention is to 
provide an information processing apparatus which has 
solved other practical problems involved in the second so 
prior art referred to above. 

According to a first invention of the present applica- 
tion, for achieving the above objects, a prefetch data re- 
quest circuit for successively outputting a group of read 
requests made to a group of data to a cache control cir- ss 
cuit for controlling accesses to a cache memory and 
successively storing a group of data supplied respon- 
sive to the group of read requests in a group of sequen- 



; ; 3!ly-ordered storage regions orovidea within a circuit 
(or Dretetcnina tne group of aaia to be orefetched re- 
quested from a processor, is provided as a circuit acti- 
vated to prefetcn the group of data. 

Further, the cache control circuit includes, as cr- 
etins activated upon prefetching the group of data. 

a circuit for transferring any of the group of data des- 
ignated by the group of read requests from the 
cache memory to the prefetch circuit when any of 
the group of data is held in the cache memory, and 
a prefetch data read request circuit for requesting 
the storage control circuit to read at least data des- 
ignated by any of the group of prefetch data read 
requests from the storage device when the desig- 
nated data is not held in the cache memory. 

The storage device control circuit includes a stor- 
age device access circuit for supplying data designated 
by a request issued from the prefetch data request cir- 
cuit to the prefetch circuit. 

Further, tne prefetch circuit includes therein, as a 
circuit activated when the processor utilizes the group 
of already prefetched data. 

a prefetch data supply circuit for detecting whether 
cata designated by the data read request issued from 
the processor is held in the group of storage regions and 
for transferring, when the designated data is held in the 
group of storage regions, the held data to the processor. 

Thus, when the prefetch circuit requests the data 
held within the cache memory upon prefetching the 
group of data, the data can be supplied from the cache 
memory to the prefetch circuit, thereby making it possi- 
ble to prefetch the data at high speed. 

In a more preferred embodiment of the present in- 
vention, a data transfer prohibit circuit, which prohibits 
the cache control circuit from transferring the designat- 
ea data from tne cache memory to the processor when 
the designatea data is held in the group of storage re- 
gions, is proviaed within the prefetch circuit. 

According to a second invention of the present ap- 
plication, the same data is held in both a cache memory 
and a prefetch circuit. When, however, a processor uses 
prefetched data subsequently, the prefetch circuit pro- 
vides the processor with the data and the prohibit circuit 
prohibits the cache memory from providing it with the 
data. As a result, the same data can be prevented from 
being doubly supplied to the processor. Further, the 
prefetch circuit can properly manage whether or not the 
prefetched data has been read into the processor. 

In other preferred embodiment of the present inven- 
tion, wnen any of the group of data each subjected to a 
data prefetch request does not exist within the cache 
memory, a circuit for requesting the cache memory to 
transfer a block including the data is provided within the 
cache control circuit. Thus, when other data that belong 
to the block, exist as plural within the group of data re- 
quested by the processor, prefetching of other data can 
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be performed at high speed using the data within the 
transferred block. However there is also potential that 
the number of data that simultaneously belong to the 
same block, of a plurality of data constituting a group of 
data to be prefetched, is zero or less provided. In this 
case, the third and fourth problems described in the first 
prior-art referred to above arise. Thus, in the more pre- 
ferred embodiment of the present invention, a circuit ts 
provided which controls whether or not to perform the 
block transfer in dependence on information designated 
as some of the data prefetch requests by the processor 

in the second invention of the present application 
as well, a circuit for supplying a plurality of data prefetch 
requests for designating, as some of the data prefetch 
requests, numbers for base registers used to calculate 
addresses of a group of data to be prefetched, to the 
prefetch circuit, is provided within the processor to allow 
a plurality of groups of data to be prefetched. 

The prefetch circuit includes therein: 

a plurality of groups of sequentially -ordered storage 
regions each used for a group of prefetched aata: 
a circuit responsive to each of a plurality of data 
prefetch requests issued by the processor, for as- 
signing one group of storage regions within the plu- 
rality of groups of storage regions: 
a circuit for storing therein, in correspondence to 
each group of storage regions, a base register 
number designated by a data prefetch request 
which has been assigned to each group of storage 
regions: and 

a circuit for storing therein, a base register number 
designated by a data prefetch request which has 
been assigned to each group of storage regions. 

Thus, the plurality of groups of storage regions are 
associated with different base register numbers desig- 
nated by different data prefetch requests. 

Further, a circuit for supplying, in case each data- 
read request is supplied to the prefetch circuit in re- 
sponse to a data read instruction for requesting the stor- 
age device to read data, a base register number desig- 
nated by the instruction to the prefetch circuit as part of 
the data read requests, is provided within the processor. 

The prefetch circuit includes therein: 

a circuit responsive to a data read request issued 
from the processor, for detecting, based on a base 
register number stored so as to correspond to each 
group of storage regions, whether a group of stor- 
age regions assigned to a data prefetch request that 
has designated a base register number designated 
by the data read request, exist: and 
a prefetch data supply circuit for transferring, in 
case a group of storage regions assigned to a base 
register designated by the data read request exist, 
the prefetched data held in the group of storage re- 
gions to the processor. 



Thus, after a plurality of groups of data have been 
prefetched and the prefetched data have been held in 
the prefetch circuit, the plurality of groups of storage re- 
gions can be utilized by different base register numbers 
5 designated by different data read requests executed by 
the processor. 

Further, according to a third invention of the present 
application, a circuit for firstly prefetching other data 
group used as an index for an indexed vector in order 
io to prefetch a group of data forming the indexed vector, 
holding same within a prefetch circuit and prefetching 
the group of data using other data group, is provided 
within the prefetch circuit. Thus, the indexed vector can 
be prefetched by using the prefetch circuit. 
is Furthermore, according to other invention of the 
present application, an information processing appara- 
tus is provided which has solved other problems. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 

Fig. l shows a view showing the overall configura- 
tion of an information processing apparatus according 
to one emboaiment of the present invention. 

Fig. 2 shows a view schematically illustrating the 
25 configuration of a prefetch unit employed in the informa- 
tion processing apparatus shown in Fig. 1 

Fig. 3 shows a view schematically showing the con- 
figuration of a prefetch status control unit employed in 
the information processing apparatus shown in Fig. 1. 
30 Fig. 4 shows a view schematically depicting the 
configuration of a prefetch request control unit em- 
ployed in the information processing apparatus shown 
in Fig. 1 

Fig. 5 shows a view schematically showing the con- 
35 figuration of a prefetch address control circuit employed 
in the information processing apparatus shown in Fig. 1 . 

Fig. 6 shows a view schematically illustrating the 
configuration of a PDSR update control circuit employed 
m the information processing apparatus shown in Fig. 1 . 
-to Fig. 7 shows a view schematically depicting the 
configuration of a prefetched-data unit employed in the 
information processing apparatus shown in Fig. 1. 

Fig. 5 shows a view showing the configuration of a 
prefetched-data read control unit employed in the infor- 
ms mation processing apparatus shown in Fig. 1. 

Fig. 9 shows a view schematically illustrating the 
configuration of a cache request unit 101 employed in 
the information processing apparatus shown in Fig. 1. 
Fig. 10 shows a view schematically showing the 
so configuration of a cache memory unit 3 employed in the 
information processing apparatus shown in Fig. 1 

Fig. 11 shows a view schematically illustrating the 
configuration of a processor 2 employed in the informa- 
tion processing apparatus shown in Fig. 1. 
55 Fig. 1 2 shows a state transition diagram of PDR el- 
ements employed in the information processing appa- 
ratus shown in Fig. 1 . 

Fig. 1 3Aa shows a structure of a sequential vector 
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to be prefetched, which are obtained by the information 
processing apparatus snown in Fig. i. 

Fig. i3Ab shows a program tor contiguous access 
of the sequential vector. 

Fig. 1 3Ac shows a program for stride access of the 
sequential vector 

- Fig. 1 3Ba shows a structure of an indexed vector to 
be prefetched, which are obtained by the information 
processing apparatus shown in Fig. 1 . 

Fig. l3Bb shows a program for accessing the in- 
dexed vector. 

Fig. l3Ca shows a structure of an array of simple 
linked lists to be prefetched, which are obtained by the 
information processing apparatus shown in Fig. 1. 

Fig. 1 3Cb shows a program for accessing the array 
of simple linked lists. 

Fig. 1 3Da shows a structure of an array of trees to 
be prefetched, which are obtained by the information 
processing apparatus shown m Fig. 1. 

Fig. 1 3Db shows a program for accessing the array 
of trees. 

. Fig. 14A shows the configuration of the first group 
of a prefetch control register employed in the information 
processing apparatus shown in Fig. 1 . 

Fig. 14B shows the configuration of the second 
group of the prefetch control register. 

Fig. 14C shows the configuration of the third group 
of the prefetch control register. 

Fig. 1 40 shows the configuration of the fourth group 
of the prefetch control register. 

Fig. 15A illustrates information to designates se- 
quential vector data used in the information processing 
apparatus shown in Fig. 1. 

Fig. 15B illustrates information to designates in- 
dexed vector data used in the information processing 
apparatus shown in Fig. 1. 

Fig. 15C illustrates information to designates a 
linked list used in the information processing apparatus 
shown in Fig. 1 . 

Fig. ISA shows an example of a source code used 
to access a two-dimensional array employed in the in- 
formation processing apparatus shown in Fig. 1 . 

Fig. 16B shows an example of a two-dimensional 
array accessed by the source code. 

Fig. 17A shows a general format for instructions 
employed in the information processing apparatus 
shown in Fig. 1 . 

Fig. 1 7B shows a format of a basic part of a memory 
access instruction employed in the information process- 
ing apparatus shown in Fig. 1 . 

Fig. 17C shows a format of a basic part of a com- 
putation instruction employed in the information 
processing apparatus shown in Fig. 1. 

Fig. 170 shows a general format of an extension 
part of an instruction employed in the information 
processing apparatus shown in Fig. 1 . 

Fig. 16 shows a view depicting the configuration of 
a mask register employed in the information processing 



accaratus sr.cwn in Fig l 

r ig. 19A snows an exampie of an status of a PDR 
ana 3 PDSR cefore control ts shifted to an interrupt 
processing routine in the information processing appa- 
5 ratus shown in -ig. 1 . 

-ig. 1 98 shows an example of an status of the PDR 
ana the PDSR after control is returned to an interrupted 
program in the information processing apparatus shown 
in Fig. 1 . 

*0 -ig. 20 shows a time chart for describing coherency 
assurance unoer the control of hardware in the informa- 
tion orocesstng apparatus shown in Fig. l. 

r ig. 21 A shows an example of a program used for 
describing coherency assurance under look-ahead con- 
's troi in the information processing apparatus shown in 
Fig. 1. 

Fig. 21 B shows a time chart for describing the co- 
herency assurance under look-ahead control in the in- 
formation processing apparatus shown in Fig. 1. 
20 -ig. 22A shows an example of a source program 
employed in -ne information processing apparatus 
snown in Fig. . 

r ig. 22B snows an example of a program obtained 
aner triple looo unrolling of the source program. 

r ig. 23 shows a view tor describing a loop object 
program of the orogram example 1. 

r ig. 24 shews a view for describing an upper half of 
an instruction execution trace of the program example 1 . 

Fig. 25 shows a view for describing a lower half of 
so the instruction execution trace of the program example 
1 

Fig. 26A shows an exampie of another source pro- 
gram employee in the information processing apparatus 
shown in Fig. 1 
~ ? 5 Fig. 26B snows an example of a program obtained 
after triple looo unrolling of the another source program. 

Fig. 27 shews a view for describing a loop object 
program of the crogram example 2. 

Fig. 25 shows a view for describing an instruction 
~o execution trace i upper half) of the program example 2. 

r ig. 29 shows a view for describing an instruction 
execution trace t lower half) of the program example 2. 

Fig. 30A shows an example of further another 
source program employed in the information processing 
-5 apparatus shown in Fig. 1 . 

Fig. 30B shows an example of a program obtained 
after triple looo unrolling of the further another source 
program. 

Fig. 31 shows a view for describing a loop object 
?o program of the orogram example 3. 

Fig. 32 shows a view for describing an instruction 
execution trace t upper half) of the program example 3. 

Fig. 33 shows a view for describing an instruction 
execution trace i lower half) of the program example 3. 

55 

DESCRIPTION OF AN EMBODIMENT 

An information processing apparatus according to 
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the present invention will hereinafter be described in fur- 
ther details with reference to emDodiments illustrated in 
the accompanying drawings. 

<Outline of apparatus> 

Fig. 1 shows the overall structure of an embodiment 
of an information processing apparatus according to the 
present invention. However, peripheral devices such as 
an input/output device, etc. are omitted. 

Reference numeral 1 indicates a memory which 
stores a program and data therein. Reference numeral 

2 indicates a processor which successively reads in- 
structions included in the program stored in the memory 
1 and successively executes them. Reference numeral 

3 indicates a cache memory unit which incorporates 
therein a cache memory 1001 for storing partial copies 
in the memory 1 therein. Reference numeral 4 indicates 
a system control unit which is of a circuit for obtaining 
access to the cache memory 1 001 or the memory 1 in 
response to a memory request (memory access re- 
quest) issued by the processor 2. However, the present 
embodiment presents a great feature that a prefetch unit 
105 is mctuded. 

This circuit will be described in detail later. There- 
fore, outlines of the prefetch unit 105 and data prefetch 
operations thereof will be described herein. The 
prefetch unit 105 executes, based on prefetch control 
information initially set by the program executed by the 
processor 2. a group of data designated by the prefetch 
control information, e.g.. look-ahead (prefetch) data of 
a plurality of elements in a certain array in asynchronism 
with an instruction execution performed by the proces- 
sor 2. The prefetched data is temporarily stored in one 
of a plurality of prefetched-data registers (PDR) (704 
(Fig. 7>) provided within the prefetch unit 105. Further, 
when a memory access instruction that needs the data 
is executed by the processor 2 later, the prefetch unit 
105 transfers prefetched data designated by this in- 
struction from the register to the processor 2. Thus, 
when the data is not held in the cache memory 1004. 
the data can be faster supplied to the processor 2 as 
compared with the conventional case where the data is 
supplied to the processor 2 from the memory 1 

The prefetch control information is pre-created in a 
general purpose register GR in the processor 2 in ac- 
cordance with the program being executed by the proc- 
essor 2. Further, the prefetch control information is shift- 
ed from the processor 2 to the inside of the prefetch unit 
105 through a signal line or conductor 106 in accord- 
ance with an initialize instruction and is written into a 
prefetch status control register (PFR) provided within 
the prefetch unit 1 05. As will be described later, the PFR 
designates a combination of a prefetch status register 
(PSR)(301 (Fig. 3)) and a prefetch control register (PCR) 
(302(Fig. 3)). A PFR write register number PFRWN in- 
dicative of the number of a register to be written is des- 
ignated by the initialize instruction and is supplied to the 



prefetcn unit 105 'rom the Drocessor 2. 

The prefetcn. unit 105 prefetches a group of data 
designated by the crefetch control information under the 
control of a cacne request unit 1 01 in the following man- 
5 ner. The prefetch -nit 105 firstly sends a prefetch re- 
quest PFREQ. a crefetch address PFA. a REQPCRN 
indicative of the number of a PCR for holding prefetch 
control information aoout data to be prefetched therein, 
a PDRTP signal indicative of a position in a PDR where 
io prefetched data is stored, and a BUF signal for desig- 
nating the presence or absence of line transfer to the 
cacne request unu 101 through a signal line 107. 

The cache recuest unit 101 detects whether data 
designated by the Dref etch address has been held in the 
'5 cache memory 10C1 . If it is detected that the data has 
been neid in the cache memory 1004. then the cache 
request unit 101 sends a cache address CADR corre- 
sponaing to this acdress PFA to the cache unit 3 through 
a line 1 1 32 and reauests the cache unit 3 to read out the 
20 data. This data CI ATA is transferred from the cache 
memory 1001 tctre orefetch unit 105 through a line 108. 
The cache request unit 101 sends a data delivery in- 
struction CADV signal, a PDR number CWPDRN. and 
a PDR-m storage e-osition CPDRIP to the prefetch unit 
25 105 through a signai line 120. The prefetch unit 105 al- 
lows tne PDR to store the data CDATA therein in accord- 
ance with these signals. When the data requested from 
the prefetch unit iC5 is not included in the cache mem- 
ory 1001. the cacne request unit 101 sends a memory 
30 read request MREG and a memory address MADR to a 
memory request unit 1 03 to request the memory request 
unit 103 to read the data from the memory 1 . 

The memory request unit 103 sends these signals 
to the memory 1 and transmits a data delivery instruction 
35 MADV. a PDR numoer MWPDRN and a PDR-in storage 
position MPDRIP tc the prefetch unit 105 through a sig- 
nai line 121. When tne data is read out from the memory 
1 a memory data unit 1 04 sends the data MDATA to the 
prefetcn unit 1 05 trvough a line 1 09. When no line trans- 
-0 fer is oesignated to aata subjected to a prefetch request, 
the aata is not written into the cache memory 1001 

The prefetch unit 105 repeatedly effects the above 
operations on a group of data to be prefetched. It is now 
of importance that the data access request is outputted 
from the prefetch unit independent of the processor 2 
as described above to process the data. The individual 
PDR respectiveiy nave a plurality of sequentially-or- 
dered storage positions where the prefetched group of 
data are held ana successively hold the prefetched 
50 group of data at afferent storage positions in accord- 
ance with the orcer of these storage positions. Further, 
when the readout of these data is requested, the 
prefetched data are successively read out in accord- 
ance with these storage positions as will be described 
55 later. This makes w easy to control the write and read- 
out positions of tne group of data. Each PDR is suited 
for holding therein a group of data associated with a 
structure complex as compared with a simple vector. 
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if the line transfer is designated to the data to be 
prefetched, then the cache request unit supplies a line 
transfer request LT to the memory request unit 103 
when the memory request unit 103 is requestea to read 
the data to be prefetched. In this case, an operation for 
sending data of one block read out in accordance with 
the- line transfer request LT to the cache memory unit 3 
through the line 109. a cache data unit 102 and a line 
114 is added to the memory data unit 104. When a 
prefetch request is made to a plurality of data included 
in the same block, it is effective tocarry out the line trans- 
fer. In the present embodiment, whether or not the line 
transfer should be done based on the value of the BUF 
signal for designating the absence or presence of the 
line transfer, which is included in the prefetch control in- 
formation, can be designated by a program. 

When the processor 2 executes a memory access 
instruction which requests readout of data, the proces- 
sor 2 sends a memory access request PRREQ and a 
memory address PRADR to the cache request unit 101 
as usual regardless of whether data designated by the 
request has been subjected to the prefetch request, and 
at the same time transmits a memory request PRREQ. 
a memory access instruction decode information LD 
and a base register number BRN of the above memory 
access instruction to the prefetch unit 105 through the 
signal line 106. 

The prefetch unit 1 05 makes a decision as to wheth- 
er the requested data is held in any one of a plurality of 
PDR provided therein. When the requested data exists 
in the PDR. the prefetch unit 105 transmits its data DATA 
and a PDR hit signal PDRHIT to the cache data unit 1 02 
through a signal line 110. Further, the cache data unit 
102 transfers them to the cache memory unit 3 through 
the signal line 114. Thereafter, they pass through the 
cache memory unit 3 and are finally transferred to the 
processor 2 through the signal line 103. Therefore, the 
prefetch unit 105 sends a DATA transfer instruction 
PDRHIT to the cache memory unit 3 through a signal 
line 111. whereby the above data transferred via the sig- 
nal line 1 1 4 is outputted to the signal line 1 03 by way of 
a bypass provided inside the cache memory unit 3 to be 
described later. 

In the present embodiment, since permission is 
granted to read out the data from the cache memory 
1001 when the data is prefetched, there is the potential 
that data whose readout request is made by the proc- 
essor 2. exists in any one of PDR of the prefetch unit 
105 and exists in the cache memory 1001 as well. In the 
present embodiment, the data to exist in each PDR is 
supplied from the PDR but is not read out from the cache 
memory. Therefore, when the PDR hit signal is supplied 
to the cache memory unit 3. the cache memory unit 3 
sends the data DATA read out of the PDR to the line 
108. Thus, as will be described in detail later, the data 
in the prefetch unit 105 can be managed by setting both 
the prefetch unit and the cache memory 1001 not so as 
to output the same data and preferentially sending the 



cata in me orefetch unit 105 to the processing appara- 
tus, without depending on the presence or absence of 
the nit supplied to the cache memory 1001. 

When the prefetch unit 105 is not hit against the 
5 memory access request for reaamg out the data from 
the processor, the data are reao out from the cache 
memory 1001 and the memory 1 in the following man- 
ner. The cache request unit 101 receives a memory re- 
quest PRREQ and an operand address PRADR sup- 
to plied from the processor 2 through a signal line 1 22 and 
cnecks whether a data block including the data desig- 
nated by this request exists in the cache memory 1001 . 
If the answer is Yes. then the cache request unit 101 
sends a cache memory address CADR for the corre- 
^ sponding data block to the cache memory unit 3 through 
a signal line 113. If the answer is No. then the cache 
request unit 101 issues a memory request MREQ and 
an address M ADR for reading out the corresponding da- 
ta to the memory request unit 103 through a signal line 
20 1 1 5. When the cache memory 1 001 is not hit against the 
cata memory access request issued from the processor 
as is normally the case in the present embodiment, the 
memory request unit 103 requests the memory 1 to read 
cut a block including data at the memory address MADR 
25 ana the memory data unit 104 transfers the data held in 
the block to the cache memory 1001. 

If the memory access request PRREQ supplied 
from the processor 1 is of a data write request, then write 
data is transferred from the processor 2 to the cache 
20 data unit 102 through a signal line 119 and is subse- 
quently written into the cache memory 1001 through the 
signal line 114. Since a method of writing the data into 
the cache adopts a so-called store-in method in the 
present embodiment, it is unnecessary to transfer the 
25 write data into the memory 1 . The reflection of the data 
on the memory 1 is realized by writing back the corre- 
sponding aata block into the memory 1 when a modified 
cata block on the cache memory 1001 is expelled from 
a iine. Therefore, the data block read out from the cache 
memory 1001 is transferred to the memory data unit 104 
ihrougn the signal line 106 and is subsequently written 
mto the memory 1 through a signal line 113. 

A problem incident to the memory access request 
tor this data writing is as follows: When the processor 2 
-5 executes a store instruction for changing corresponding 
data after the data has been read out by prefetching, the 
prefetched data becomes invalid when this store in- 
struction exists before a memory access instruction for 
accessing to the corresponding data from the viewpoint 
50 of the program. The present embodiment presents an 
important feature as well in that at least part of address- 
es for the prefetched data are held along with the 
prefetched data, addresses for the store instruction ex- 
ecuted by the processor 2 are compared with the part 
55 one by one. if there is a possibility that a change in ad- 
dress for given prefetched data has been made to inval- 
idate it. it is detected, and changed data is fetched by 
accessing the memory other than PDR upon execution 
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of the memory access instruction in regard to the invalid 
data, thereby solving the problem. For the sake of the 
above control, the processor 2 sends a store instruction 
execution command ST and an operand address 
PRADR to the prefetch unit 105 through the signal line 
106. Further, the prefetch unit 105 transmits a PDRHIT 
signal to the cache request unit 101 . 

In the present embodiment as well, when the proc- 
essor outputs the prefetch control information to the 
prefetch unit 105 and issues the memory access re- 
quest, it designates the base register number BRN in 
either case. Thus, the PDR for holding the prefetched 
group of data therein is associated with the base register 
number designated by the prefetch control information. 
Thereafter, the processor 1 detects, based on a base 
register number designated when the processor 1 has 
output data and a memory access request for readout 
of data, whether the PDR associated with the base reg- 
ister number exists. If the corresponding PDR exists, 
then one of the group of prefetched data in the PDR is 
read out. Thus, a circuit for holding a plurality of groups 
of data therein is simplified. 

Further, in the present embodiment, a group of data 
having structures complex like elements constituting an 
indexed vector or linked vector or the like can be aiso 
prefetched as well as a group of data constituting a sim- 
ple vector. 

Furthermore, in the present embodiment, a so- 
called address skip can be also realized wherein an in- 
terval between addresses for data to be prefetched is 
changed according to the data to be prefetched in order 
to allow a group of data composed of data that belong 
to a plurality of rows in an array to be continuously 
prefetched. 

Still further, the present embodiment can be applied 
even to the case where an IF statement exists in a loop 
if prefetched data are skipped by empty transfer even 
when a conditional memory access instruction is sup- 
pressed from execution. 

Still further, the present embodiment can also cope 
with an interrupt that takes place with respect to an in- 
struction execution of the processor 2. Namely, the proc- 
essor 2 notifies an interrupt signal INT to the prefetch 
unit through the signal line 106. Upon receipt of this, the 
prefetch unit pauses its prefetch operation. When the 
paused prefetch operation is resumed after completion 
of interrupt processing, a save recovery of PCR and 
PSR is executed. Therefore, the processor 2 sends reg- 
ister address information PFRRN for reading out these 
registers related to the prefetch to the prefetch unit 
through the signal line 1 06. The read information is sent 
to the processor 2 through a signal line 112. 

A structure and operations of the present embodi- 
ment will hereinafter be described in more details inclu- 
sive of other characteristics of the present embodiment. 



<Processor> 

Fig. 11 is a view schematically showing the structure 
of the processor 2. Reference numeral 1101 indicates 
5 an instruction control unit which has an instruction fetch 
circuit 1101 Afor fetching an instruction from the memory 
1 and an instruction decoder 1 1 01 B for decoding it. Ref- 
erence numeral 1102 indicates a request control unit 
which generates a memory request PRREQ for making 
10 an instruction fetch request and requesting reading or 
writing of memory data, based on a request issued from 
the instruction control unit 1101. A small capacity type 
intraprocessor cache 1 1 02Acan be also provided inside 
the request control unit 1102. Reference numeral 1103 
'5 indicates a computation control unit which includes 
therein a functional unit 1103A. a general purpose reg- 
ister (GR) group 1103B. a floating point register group 
1103C. a mask register 103D. etc. 

The request control unit 1 1 02 sends the memory re- 
20 quest PRREQ and its address PRADR to the cache re- 
quest unit 1 01 through the signal line 1 22 ana transmits 
write data to the cache data unit 102 througn the signal 
iine 1 1 9 upon writing. Incidentally to a request made un- 
der a memory access instruction, the request control 
25 unit 1 1 02 receives decoded information LD thereof and 
a base register number BRN thereof from the instruction 
control unit 1101 and receives a mask register value MK 
from the computation control unit 1103. Further, the re- 
quest control unit 1102 sends them to the prefetch unit 
30 105 through the signal line 106 together with the 
PRREQ and PRADR. The request control unit 1102 re- 
ceives a store instruction execution command ST from 
the instruction control unit 1101 incident to a request 
made under a store instruction and sends it to the 
J5 prefetch unit 105 in the same manner as described 
above. When a data readout request is made, read data 
is supplied from the cache memory unit 3 to the request 
unit 1102 through the signal line 103. 

When the computation control unit 1103 receives an 
access instruction with respect to a prefetch register 
PFR from the instruction control unit 1101. the compu- 
tation control unit 1103 generates an access request 
signal PFRREQ and a PFRRN and a PFRWN for des- 
ignating read and write registers. Upon writing, the com- 
■ts putation control unit 1103 fetches or reads the general 
register GR and sends these to the prefetch unit 105 
through the signal line 106. The read contents of PFR 
are inputted from the prefetch unit 105 through the sig- 
nal line 112. 

so Further, the computation control unit 1103 detects 
an interrupt factor upon instruction execution. When it 
is detected by the computation control unit 1103. the 
computation control unit 1103 generates an INT signal 
and notifies it to the prefetch unit 105 through the signal 

55 line 106. In addition, the request control unit 1102 re- 
ceives a PDRWAIT signal from the prefetch unit 105 
through the signal line 112. When no data arrives from 
the prefetched-data register PDR. the request control 
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unit n02 puis the issuance of ihe subsequent memory 
request PRREQ on hold in accordance with the signal. 

<Exampies of data structures to be prefetched> 

Before the detailed description of prefetch control, 
the data structures to be applied to a prefetch system 
showing the present embodiment will be described. The 
data structures for the present prefetch system are clas- 
sified into four types shown in Figs. l3Aa to 1 3Db. Pro- 
gram examples that obtain access to respective struc- 
tured data, will be added to the same drawing. 

1 ) Sequential vector 

A data structure shown in Fig. 1 3Aa shows a vector 
in which N elements each having a length of 1 are se- 
quentially arranged on the memory 1 . A leading address 
thereof is of an &B( 1 ). The respective elements are rep- 
resented as B( 1 ). B(2). .. B(N). A program example that 
contiguously accesses these, will be shown in Fig. 
1 3Ab. In this case, an interval between the elements ac- 
cessed at each loop coincides with the element length 
l. Further, a program example that performs stride ac- 
cess, will be shown in Fig. 1 3Ac. In this case, an access 
interval is of 5 x 1 . 

2) Indexed vector 

The present data structure is composed of two se- 
quential vectors L(i) (1 < i N) and Bfj) (1 < j < M) 
shown in Fig. 13Ba. Element lengths and leading ad- 
dresses of L(i) and Bfj) are respectively represented as 
1L and &L(1) and iBand &BM) In the indexed vector, 
elements of first level (hereinafter called "indexed vector 
or risk vector") L(i) respectively snow element numbers 
of second level (hereinafter called "target vector") Bfj) 

A program example that accesses the indexed vec- 
tor, will be shown in Fig. l3Bb. In the example, the ac- 
cess to the indexed vector is effected on the contiguous 
elements in turn. A single indexea vector is occasionally 
used for the access of a plurality of target vectors ac- 
cording to a user code. The present embodiment can oe 
applied even in this case. 

3) Array of simple linked lists 

The present data structure is composed of the set 
of data represented in plural levels. The most significant 
level corresponds to a sequential vector composed of a 
leading address a and an element length 1L. Each of 
elements b. b\ ... b" thereof holds a leading address of 
each table in which data of a second level is placed. Da- 
ta of levels lower than the second level are of data held 
in the table and whose each position is designated by 
its corresponding leading address in the table and a dis- 
placement from the head. The present embodiment is 
intended for the case where the displacement is held 



ccnstant to aii ;ne data within ihe same level. However, 
an extension for allowing the displacement to vary at 
eacn data can ce also carried out. Each data having the 
levei lower man the second level exclusive of the least 

s significant levei shows the leading address in the table 
for storing data oi the next level therein. 

:n the examole shown in the drawing, the most sig- 
nificant leading element b of the vector shows a table C 
including data c of a second level. The data c is placed 

io m a oosition corresponding to a predetermined displace- 
ment ]. The aata c shows a table D including data d of 
a third levei and the data d is placed in a position corre- 
sponding to a predetermined displacement k. The data 
d represents a table E including data e of the least sig- 

'5 nificant levei ana the data e is placed in a position cor- 
responding to a predetermined displacement 1 . An ex- 
amDie of a program that accesses the array of the simple 
linked lists, will be shown in Fig. 1 3Cb. In addition to this 
example, the present function is suited to accessing to 

20 an array of structs easy to be described by C language. 

^ Array of trees 

The present data structure is substantially similar to 
:5 ;he array of the simple linked lists described in the par- 
agraph 3). As shown in Fig. l3Da. the data structure is 
composed of the set of data of a plurality of levels. The 
most significant level is of a sequential vector composed 
of a leading address a and an element length 1L. Re- 

30 spective elements b. b' b" respectively hold leading 

addresses in tables in which data of second levels are 
placed. Data of respective levels lower than the second 
levels are of data in the tables. Its data position is des- 
ignated by a leading address of each table and a dis- 
placement from the head. 

The array of the trees differs from the array of the 
simple linked lists described in the paragraph 3) in that 
ihe data of the levels lower than the second levels exist 
as plurai witn respect to the corresponding data high by 
-o one level. The oiurality of data are placed on the same 
;abie and eacn displacement from the head of the table 
'5 designated m advance. The present embodiment is 
intended for the case where the displacement is held 
constant to all the data with the same level. However. 
-5 an extension for allowing the displacement to vary at 
each data can oe also carried out. Each data having the 
level lower than the second level exclusive of the least 
significant levei shows the leading address in the table 
for storing data of the next level therein. 
so in the example shown in the drawing, the most sig- 
nificant leading element b of the vector shows a table 
CD including data c and d of second levels. The data c 
and d are placed in positions corresponding to prede- 
termined displacements j and k. The data c shows a ta- 
55 ble E including data e of the least significant level and 
ihe data e is placed in a position corresponding to a pre- 
determined displacement 1 . The data d represents a ta- 
ble F including data f of the least significant level and 
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the data f is placed in a position corresponding to a pre- 
determined displacement m. An example of a program 
that accesses the array of the trees.- will be shown in 
Fig. l3Db. 

dnstruction formats> 

Figs. 17A to ,17D show instruction formats em- 
ployed in the present embodiment. Fig. 17A shows a 
general instruction format. An instruction comprises a 
basic part of 32 bits and an extension part of 16 bits. 
The basic part conforms to the PA-RISC architecture of 
U.S. Hewlett-Packard Company. Fig. 17B shows a basic 
part format of a memory access instruction. Symbols op. 
b. t/r. s and im14 represent an instruction code, a base 
register number, an operand register number, a space 
register designation and a 14-bit immediate respective- 
ly. Fig. 17C indicates a basic part format of a computa- 
tion instruction. Symbols opl ana op2 respectively des- 
ignate instruction codes, symbols rl and r2 respectively 
designate operand register numoers. symbol t desig- 
nates a register number at which the result is stored, 
and symbols c and f respectively designate instruction 
suppress conditions. For detailed information, refer to: 
"PA-RISC 1.1 Architecture and Instruction Set Refer- 
ence Manual". Second edition, pp. C-1 to C-6. Hewlett- 
Packard. HP Part Number: 09740-90039. 1992. 

Fig. 1 7D shows a format of an extension part. Sym- 
bols rm and tm respectively indicate a read mask 
number and a write mask number both of which desig- 
nate a mask register MR shown in Fig. 1 S. The mask 
register is composed of 62 one-bit registers. The mask 
register stores conditional values corresponding to the 
result of computation by a computation instruction. 
Namely, upon execution of the computation instruction, 
a condition corresponding to the result of computation 
by the computation instruction is stored in a mask. reg- 
ister designated by tm in accordance with the designa- 
tion of the c and f fields of the oasic part. If the value of 
a register designated by rm is 1 . then the instruction des- 
ignated by the basic part is executed. If the value thereof 
is 0. the instruction thereof is suppressed. Controlling 
the presence or absence of the instruction execution by 
the mask register is called "conditioned execution con- 
trol". When the numbers 0 and 1 are designated by rm. 
mask values 0 and 1 are identically read out. Further, 
when the number 0 or 1 is designated by tm. the con- 
tents of each mask register remain unchanged. 

Specifications of a specific instruction for prefetch 
control will be described below. In the following descrip- 
tion, symbol [m] indicates a read mask field. The exe- 
cution of the instruction is controlled in accordance with 
the contents of a mask register designated by m. The 
next field shows the type of instruction and several op- 
erand fields further continue after the next field. The type 
of instruction and the operand will be described at each 
instruction. 



: Setup and save/recovery instructions of prefetch 
status register t FSRV 

C Move GR to PSR: [m] MVRPSR. si 

The present instruction sets the value of PSR in ac- 
cordance with the contents of a general purpose register 
iGRI designated by sV The present instruction is used 
to recover PSR at a context switch such as initialization 
w of PSR. an interrupt to PSR or the like. 

C Move to PSR Immediate: [mJMVPSRI. imml6 

The present instruction sets PSR in accordance 
is witn an instruction 1 6-bit immediate field. The present 
instruction is used to initialize PSR. 

C Move PSR to GR: (m)MVPSRR. t 

20 The present instruction allows the contents of PSR 
:c oe stored in GR designated by t. The present instruc- 
*.:cn is used to ootain access to PSR. 

2) Setup and save/recovery instructions of prefetch 
■25 control register iPCRV 

C Move GR to PGR: [mjMVRPCR. s1 . t 

The present instruction sets the value of PCR des- 
30 ignated by t in accordance with the contents of GR des- 
ignated by s1 . The present instruction is used to recover 
PCR at a context switch such as initialization of PCR. 
an interrupt thereof or the like. 

35 C Move PCR to GR: [m]MVPCRR. s1. t 

The present instruction allows the contents of PCR 
designated by si to be stored in GR designated by t. 
~~-e present instruction is used for saving of PCR at a 
-0 ccntext switch such as an access and an interrupt of 
-CR or the like. 

-if Control instruction 

-5 The following are used as instructions for updating 
soecific fields of PSR and PCR alone. 

C Set PCR ACT. [mjSPCRACT. imm8. t 

so The present instruction brings an ACT flag (Fig. 
14A) of PCR designated by t up to date in accordance 
with the value of an imm8 field. 

C Set PSR SUSP: [mjSPSRSUSP. immS 

55 

The present instruction updates an SUSP flag (Fig. 
2\ of PSR in accordance with the value of the imm8 field. 
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O Initialize PCR: [mjiPCR. all. t 

The present instruction resets VLD and ACT flags 
(Fig. 14A) of PCR designated by t. If. however ail is 1. 
the present instruction effects reset on ail the PCR 

Omitialize PDR: [m]IPDR 

The present instruction resets RC fields related to 
all the elements of all POSR to 0. 

<Prefetch unit> 

Fig. 2 illustrates major signals transferred between 
four units forming the prefetch unit 105 and between 
these and apparatus components other than the 
prefetch unit. A supplemental description will be made 
of detailed senses of the respective signals and a meth- 
od of generating the signals with reference to explana- 
tory drawings for every units later. Schematic configu- 
rations of the respective units and their operations will 
now be described using only the major ones of signals 
shown in Fig. 2. 

Reference numeral 201 indicates a prefetch status 
control unit for effecting read/write control on the 
prefetch status register (PSR) 301 and the prefetch con- 
trol register (PCR) 302. The PSR and PCR are called 
"prefetch status control unit (PFR) in combination. When 
the prefetch is allowed to operate, the prefetch status 
control unit 201 sets information for designating an ob- 
ject data structure and a prefetch operation mode to 
PFR under the execution of a program by the processor 
2. 

Upon setting the PFR by the program, the set infor- - 
mation is stored in the general purpose register (GR) in 
the processor 2 in advance. Next, the set information is 
written into the PFR designated by the aforementioned 
prefetch control instruction. With the execution of the 
prefetch control instruction at this time, the prefetch unit 
105 receives a PFR write command PFRREQ. a PFR 
identification number PFRWN and the data in the GR 
from the processor 2 through the signal line 106. 

Reference numeral 202 indicates a prefetch re- 
quest control unit for issuing a prefetch request and up- 
dating a PCR and a prefetched-data status register PD- 
SR to be described later in response to the issuance of 
the prefetch request. The prefetch request control unit 
202 reads a PSR and a PCR through a signal line 206 
and reads a PDSR from a prefetched-data unit 203 
through a signal line 210. Further, the prefetch request 
control unit 202 checks whether a request issuable one 
exists in 16 PCR. If the answer is Yes. then the prefetch 
request control unit 202 selects a PCR for issuing a re- 
quest next time based on a predetermined standard to 
be described later Thereafter, the prefetch request con- 
trol unit 202 sends a prefetch request PFREQ and a 
PCR number REQPCRN (whose value will hereinafter 
be called V) to the prefetch status control unit 201 



ihrcugn a signal line 205. The orefetch reauest control 
unit 202 reads a field incident to the request from an i- 
th pretetch control register PCRi designated by tne 
REQPCRN ana takes it in through the signal line 206. 
5 Each field of the PCR is hereinafter represented by ap- 
plying a subscript t thereto. 

As major ones in these, may be mentioned a 
prefetch address PFAi and a PDR top pointer PDRTPi 
indicative of the position where the read data is stored 
io m the PDR. These are sent to the cache request unit 
101 through the signal line 1 07 together with the PFREQ 
and REQPCRN signals. The state of each PDSR in the 
prefetched-data unit 203 is displayed as request issued 
through a signal line 209. A PDSR position for this is 
'S designated by the REQPCRN and PDRTPi. As initialize 
information, may be mentioned at least part of PFAi arid 
the number of accesses NRi to be described later. 

With the issuance of the PFREQ. the prefetch re- 
quest control unit 202 next generates updated values 
zo (hereinafter upoated values of respective fields of the 
PCR and PDSR are added with ' as field names') of fields 
of PCRi related to the request issuance, such as the 
PFAi and PDRTPi or the like and sends them to the 
prefetch status control unit 201 through the signal line 
2* 205. In order to generate the updated value of the 
pretetch address PFAi. an address modifier MODi or the 
like to oe described later is read from the PCRi. Since 
the processing of the previously-accepted prefetch re- 
quest is not yet completed, the cache request unit 101 
JO senos a busy signal REQBSY indicative of a state in 
which the request is non-accepted at present to the 
prefetch request control unit 202 through the signal line 
120. 

The prefetched-data unit 203 includes a plurality of 
35 prefetched-data registers (PDR) 704 for temporarily 
storing prefetched data therein and prefetched-data sta- 
tus registers (PDSR) 705 for storing read control infor- 
mation thereof therein. When the prefetched data is 
read out of the cache memory 1001. the prefetch unit 
-o 'OS takes in the data delivery instruction CADV. the 
CWPDRN indicative of the storage PDR number and the 
CPDRIP indicative of each element position in the PDR 
each having the PDR number through the signal line 120 
and stores therein the data CD ATA delivered from the 
~s cacne memory 1001 in accordance with this taken-in 
processing. 

When the prefetched data is read from the memory 
1 . the prefetch unit 1 05 takes in the data delivery instruc- 
tion MADV. the MWPDRN indicative of the storage PDR 

so number and the MPDR1 P indicative of each element po- 
sition in the PDR each having the PDR number and 
stores therein the data MDATA in accordance with the 
iaken-in processing. With the storage of the data there- 
in, the state of PDSR is displayed as data arrived. 

5S in order to pause the prefetch operation according 
:: :ne generation of an interrupt by the processor 2. the 
prefetched-data unit 203 sends a status signal PFBSY 
•or identifying the completion of a prefetch request proc- 
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ess to the prefetch status control unit 201 through a sig- 
nal line 21 3. 

Reference numeral 204 indicates a prefetched-data 
.read control unit lor controlling the transfer of data tem- 
"porarily stored in the corresponding prefetched-data 
register PDR to the processor 2. The prefetched-data 
read" control unit 204 takes in the memory request 
PRREQ and the memory access instruction decode in- 
formation LD through the signal line 106. The present 
embodiment provides one feature that the base register 
number BRN at which the operand address for this in- 
struction has been stored, is associated with each PDR 
as a key simultaneously at this time. 

Namely, since the base register number made un- 
der the memory access instruction is stored in an in- 
struction identifier IID field of a PGR corresponding to 
the PDR associated with the base register number BRN. 
it is possible to identify that the corresponding PDR is 
associated with the base register number by comparing 
the above BRN with the stored base register number. 
Therefore, the prefetched-data read control unit 204 
takes in the values of IID fields of all the PCR through a 
signal line 208. The prefetched-data read control unit 
204 sends the thus-identified PDR numoer to the 
prefetch status control unit 201 as a RPDRN signal 
twhose value is regarded as m) through a signal line 
207. Further, the prefetched-data read control unit 204 
reads a PDR out pointer field PDROPm (whose value 
is regarded as n) from the designated PCRm and takes 
it in through the signal line 208. The prefetched data- 
read control unit 204 sends a PDR read instruction PDR- 
REQ. a read PDR number RPDRN and the above 
PDROPm to the prefetched-data unit 203 through a sig- 
nal line 212. Thereafter, the prefetched-data read con- 
trol unit 204 reads out data DAT Am (n) from an nth ele- 
ment of an mth PDR and sends it to the cache data unit 
102 through the signal line 110. At this time, the 
prefetched-data read control unit 204 checks the validity 
of the data stored at the PDR position and sends a 
PDRHIT signal indicative of the validity of the data to the 
cache memory unit 3 through the signal line 111. the 
cache data unit 102 through the signal line 110 and the 
cache request unit 101 through the signal line 107. re- 
spectively. Thus, the cache request unit 101 suppresses 
a memory request operation based on the memory ac- 
cess instruction. 

If no data arrives at the corresponding PDR posi- 
tion, then the prefetched-data read control unit 204 gen- 
erates a PDRWAIT signal and sends it to the cache re- 
quest unit 101 and the processor 2. In response to this 
signal, the cache request unit 101 holds a cache access 
request made under the memory access instruction. 
The cache request unit 101 holds the PDRWAIT signal 
until the corresponding prefetched data arrives, and re- 
leases it from holding upon arrival. At this time, the 
prefetched-data read control unit 204 checks the validity 
of the data and generates a PDRHIT from the result of 
check in the same manner as described above. When 



the oata in each PDR is invalid, the PDRHIT is not es- 
tablished and hence the memory request operation 
maae under the memory access instruction is proc- 
essea without being suppressed, it is thus possible to 
5 reaa the up-to-date data from the cache or memory 1 
As a result, a prooiem related to the invalidation of data 
due to updating of the prefetched data by a store instruc- 
tion can be resolved. Incidentally, the processor 2 that 
received the PDRWAIT therein, wilt hold a continuing 
w memory request wnile the processor 2 is being turned 
on. The prefetchec-data unit 203 reads respective fields 
RC. Rl. DA ana Dl of each PDSR to be described later 
through the signal line 212 to generate the above 
PDRHIT and PDRWAIT and inputs them to the 
15 preietched-data read control circuit 204. 

Upon execution of the store instruction, the 
prefetched-data unit 203 takes in the memory request 
PRREQ. the store instruction execution command ST 
and the operand address PRADR through the signal line 
20 106. At this time, tney are compared with at least some 
of prefetch addresses held every elements of respective 
PDR corresponding to respective PDSR. When they co- 
incide with each ether, the invalidity of read data is dis- 
played on the corresponding PDR element as the pres- 
25 ence of an update oossibility. 

More detailed structures of the respective units that 
form the prefetch unit 105 will be described below. 



(Prefetch status control unit 201 ) 

30 

Fig. 3 shows the details of the prefetch status con- 
trol unit 201. 

Reference numeral 301 indicates the prefetch sta- 
tus register (PSR^ which holds therein an SUSP for dis- 
35 piaymg a stoppea state of issuance of a prefetch re- 
quest, a prefetch request busy flag PFBSY for display- 
ing the presence or absence of an acting prefetch re- 
ouest and a PDRDPT for displaying the depth of PDR. 
When the PFRREQ signal is on and the PFRWN desig- 
*o nates a PSR. the writing of data into the PSR301 is per- 
formed. At this time, the contents of the GR delivered 
from the processor 2 are written into the PSR301 . How- 
ever, no data is written into the PFBSY flag. The 
PSR310 inputs the INT signal for notifying the genera- 
ls non of an interrupt from the processor 2 and sets the 
SUSP flag in response to the INT signal. The PSR310 
inputs the PFBSY signal from the prefetched-data unit 
203 through the signal line 213 and displays it on the 
PFBSY flag. 

50 The reading of data from the corresponding PSR by 
the processor 2 is performed by designating a read ad- 
dress PFRRN ana a PSR in accordance with a program 
and selecting the contents thereof by a selector 318. 
The pfrrn is supplied to the selector 318 through a 

55 signal line 31 9. The output of the selector 31 8 is sent to 
the processor 2 through the signal line 1 1 2. Further, the 
contents of PSR are sent to the prefetch request control 
unit 202 through the signal line 206. 
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Meanings of respective fields in PSR are as follows: 

O Pause flag: SUSP 

The SUSP indicates a state in which a prefetch 
mechanism leading to the present embodiment is in a 
request stop state. When the SUSP is on. the prefetch 
mechanism is in the request halt state. Even if active 
ones exist in the corresponding prefetch control register 
PCR. no prefetch request is issued. When the SUSP is 
off. the prefetch request can be issued. When an inter- 
rupt is generated due to an exception of storage protec- 
tion, for example, the processor 2 notifies it to the 
prefetch status register through the signal line 106 
based on the INT signal so as to set the SUSP flag of 
the corresponding PSR. The SUSP can be set by the 
program. 

O Prefetch request busy flag: PFBSY 

The PFBSY represents the existence of a prefetch 
request indicative of non-completion of data reading. 
The PFBSY is used for bringing the SUSP to on upon 
generation of an interrupt to stop the issuance of the 
request and identifying that the prefetch mechanism is 
brought into a pause state owing to the turning off of the 
PFBSY flag. 

O PDR depth: PDRDPT 

The PDRDPT shows the number of elements of 
each of 16 PDR. Thus. 16 x PDRDPT register elements 
each having a 8-byte width are prepared for temporary 
storage of prefetched data in the entire apparatus. 

The SUSP and PFBSY flags wtil be described in de- 
tail upon interrupt processing to be described later. 

Reference numeral 302 indicates 16 prefetch con- 
trol registers PCRO through PCR 15 for storing prefetch 
control information therein respectively. When no chaos 
ts caused, either a single prefetch control register or all 
the prefetch control registers will be simply described as 
PCR. The PCR comprises information for describing po- 
sitions and structures of data to be prefetched, identifi- 
cation information about an instruction which requests 
prefetched data, information about data read request 
control, control information for reading out data held in 
each PDR. etc. 

Referring to Fig. 3. reference numerats 303. 305 
and 307 respectively indicate data write circuits provid- 
ed so as to correspond to PCR. The data write circuits 
respectively write GR. PDROPnrV and (PDRTPi'. PDR- 
QPiV PFAi'. SKIPCNTi') into PCR having numbers indi- 
cated by PFRWN. RPDRN and REQPCRN through sig- 
nal lines 304. 306 and 306 in accordance with instruc- 
tions of PFRREQ. PDRREQ and PFREQ in turn. 

Reference numeral 315 indicates balance 2(BAU 
field update circuits provided every PCR. Each of the 
update circuits 31 5 is inputted with a balance field BAL 



for its ccrresDoncmg PCR through a signal line 31 6. a 
BAL increment instruction incident to the issuance of 
PFREQ througn tne signal line 303 and a balance (BAU 
decrement instruction incident to the issuance of PDR- 

5 REQ through the signal line 306. A balance (BAL) *s 
addea by one with respect to the increment instruction 
ana is reduced cy one with respect to the decrement 
instruction. When both instructions are simultaneously 
received, an upaated value is generated as it is and is 

io set to its corresponding balance BAL through a signal 
line 317. Thus, tne number of elements of PDR which 
do not effect reaamg on the processor 2. of those that 
have issued requests, is displayed on the corresponding 
balance BAL. 

ts Reference numerals 309. 311 and 31 3 indicate 
PCR data read circuits respectively. The data read cir- 
cuits 309. 311 and 31 3 are respectively sequentially in- 
putted with the contents of PCR through signal lines 
310. 31 2 ana 314. select PCR having numbers indicated 

20 by PFRRN. RPDRN and REQPCRN and output thecon- 
tents (PDROPm. MOEm. RDCm) and (ORGi. LLi. 
PDRTPi. PDRQPi. LPCRi. NRi. PFAi. MODi. DLL BUFL 
SKlPi SKIPPi SKIPCNTi and SKIPGAPh of PCR. 
which are set in couble word units, through a signal line 

25 320. the signai line 205 and the signal line 206. Further. 
VLD and IID fields of all the PCR are outputted to the 
signal line 205 and similarly, all the VLD. ACT. LA. LAC. 
3AL and ORG fields are outputted to the signal line 206. 
A single PCR is composed of four double words as 

20 shown in Figs. 1 4A to 1 4D. It is necessary for a context 
switch incident to an interrupt or the like to perform save/ 
recovery of related PCR by a program. Since the re- 
spective fields are classified into groups according to 
functions and are respectively assigned in double words 

j5 ana accesses by the program are executed in double 
wora units, groups related to unnecessary functions can 
be excluded from the objects to be save/recover. Thus, 
an advantageous effect can be brought about that the 
numoer of program execution steps incident to interrupt 

-o processing and initialization can be reduced. 

The transfer of information between GR and PCR 
<s performed in accordance with the aforementioned 
dedicated transfer instruction. Incidentally, a discrete 
access to a specific field can be also executed as a PCR 
-5 access in addition to the double word units via the GR. 
Since these can be easily realized by a normal tech- 
nique, detailed structures for its illustration will not be 
shown in particular. 

Meanings of the respective fields of PCR will be de- 
50 scribed below. 

1 ) First group: basic prefetch control information O Valid 
flag: VLD 

5* The VLD shows that the corresponding PCR is valid 
under the present context. 



13 



BNSOOCID: <EP 0723221 A2_l_> 



25 



EP 0 723 221 A2 



26 



O Active flag: ACT 

The ACT represents that the corresponding PCR is 
in an active state. When the PCR is active, a prefetch 
operation corresponding to designated data is execut- 
ed. 

O Original flag: ORG 

The ORG represents that the corresponding PCR 
designates a sequential vector or an indexed vector and 
an initial PCR of a PCR link at prefetching in a linked 
list. There may be cases where a PCR whose ORG is 
on. is called original PCR and others are called linked 
PCR. 

O Prefetched-data length: DL 

The DL designates an operand length of a memory 
access instruction, which is to be prefetched. However, 
the operand length is defined as an exponent obtained 
when expressed by the square of width of 2. 

O Prefetch buffer designation: BUF 

The BUF designates a location where prefetched 
• data is held. i.e.. a prefetch buffer. When the BUF is off. 
data is stored only in a PDR used as the prefetch buffer. 
When the BUF is on. data are stored in both of a PDR 
and a cache. Namely, when the BUF is off. a single data 
is required of the memory 1 upon cache miss, whereas 
when the BUF is on. line transfer is required of the mem- 
ory 1. 

O Instruction identifier: I ID 

The I ID is used to identify a memory access instruc- 
tion which becomes an object for the corresponding 
prefetch. A base address register number for a memory 
access instruction is used for its identification. The 
present field is compared with the base address register 
number upon execution of the memory access instruc- 
tion. When they coincide with each other, a prefetched- 
data register transfers data to the corresponding mem- 
ory access instruction. 

O Look-ahead control designation: LA 

The LA makes a look-ahead or prefetch control 
function effective. When the prefetch control function is 
valid or effective, a look-ahead count LAC to be de- 
scribed below designates the number of times to be ex- 
ecuted in which look ahead is allowed, to the corre- 
sponding memory access instruction. 

O Look-ahead count: LAC 

The LAC designates a value indicative of the max- 



imum number of times to be executed in, which the 
prefetcnmg is allowea. to the corresponding memory ac- 
cess instruction. 

5 O Address skip designation: SKIP 

The SKIP makes an address skip function valid or 
effective. When the address skip function is effective, a 
prefetch address is updated by a length designated by 
10 a skip gap SKIPGAP to be described below every data 
prefetcn requests corresponding to the number desig- 
nated by a skipcount SKIPCNT to be described below. 

O PDR top pointer' PDRTP 

75 

The PDRTP inaicates the position of an element for 
holding data based on the latest request in a PDR there- 
in 

20 o Processing request counter: BAL 

The BAL indicates the number of processing re- 
quests under which aata transfer is not performed. 

25 o Numoer of references: NR 

The NR indicates the number of times in which the 
same aata read out based on the designation of the cor- 
responomg PCR is referred to. The reference called 

JO herein includes a reference defined as an address or 
modifier {both of which will be called linked data) nec- 
essary for data reading defined as an operand based 
on the corresponding memory access instruction and a 
data fetch operation designated by a baby PCR. The NR 

J5 indicates the numoer determined depending on a data 
structure. In the present embodiment, however, the 
number can be designated up to 16 at the maximum. 
Individual data stored in the PDR are regarded as has 
been used and completed after references correspond- 

-o . mg to the number of times designated by the NR. Refer 
to RC fields in eacn PDSR to be described later. 

O Mask override designate^. MOE 

•ts The MOE designates whether a data read request 
made from a PDR incidentally to the execution of a 
memory access instruction depends on the value of a 
mask register, which indicates a condition for execution 
of the corresponding memory access instruction. If an 

50 if statement exists in a loop and the corresponding 
memory access is executed in dependence on an IF set- 
tlement condition, then empty reading of the PDR at the 
time that a mask value is 0 by bringing the MOE to on. 
is performed. 

55 

O PDR out pointer: PDROP 

The PDROP indicates the position of each element 
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in a PDR that is holding data to be transferred when the 
corresponding memory access instruction is executed. 

O Redirection data counter: RDC 

The RDC indicates the number of elements, to re- 
directed by a prefetch buffer redirection function to be 
described later, of effective data held in a PDR. This will 
be described in detail later upon interrupt processing. 

2) Second group: option information 

O Prefetch address: PFA 

The PFA represents a memory address for prefetch, 
which is brought up to the next address together with 
the issuance of a request in the case of an origin PCR 
(sequential vector). Thus, a value read out from a PFA 
field is regarded as the prefetch address as it is upon 
issuance of the next prefetch request. In the case of a 
linked PCR. an address generated by a predetermined 
method and stored in the corresponding PFA field is re- 
garded as the prefetch address. 

3) Third group: option information 

O Address modifier: MOD 

The MOD is of a 64b address modifier. In the case 
of the sequential vector, the MOD designates an interval 
between data elements. In the case of an indexed vec- 
tor, the MOD designates a displacement from a leading 
address of a target vector, whereas the MOD designates 
a displacement from an address in a liked table in the 
case of a linked list. The details of the MOD will be de- 
scribed later with reference to Figs. 15A to 15C. 

4) Fourth group: option information 

O Linked list prefetch designation: LL 

The LL shows that the corresponding PCR is of a 
baby PCR for designating prefetch in the linked list. At 
this time, a memory address is generated by adding 
linked data read out from a parent PCR to the address 
modifier of the corresponding PCR and is held at a PFA. 

O Link PCR number: LPCR 

The LPCR shows the number of a parent PCR. 
which is used for prefetch of an indexed vector and a 
linked list. A plurality of babies PCR can be normally 
designated with a common PCR as parent. Thus, when 
reference is made to two types of target vectors under 
the use of a common indexed vector, for example, the 
prefetch can be applied even when index reference for 
generating addresses of the two types of target vectors 
is programmed so as to be read out from a GR other 



ran the memory i. which stereo an maex for one type 
zi target vector inerein. This is one feature obtained 
:rom the present embodiment. 

s O PDR request comter: PDRQP 

The PDRQF indicates the position of each element 
.n a oarent PDR :n which data for accessing an address 
generating index or a table address at an indexed vector 
w or prefetch of a nnked list is held. 

O Skip interval: SKIPP 

The SKIPP snows a request interval at which an ad- 
cress skip to be Described later is performed. 

O Skip gap: SKiPGAP 

The SKIPGAP indicates the value of an address in- 
?o crement at eacn address skip. When the address skip 
.5 made, one oDiamed by adding the SKIPGAP to a PFA 
•5 regarded as ine next address. 

O Skip count: SKiPCNT 

The SKIPCNT indicates the number or requests is- 
sued after the recently-executed address skip. 

Next, designated examples of PCR and data struc- 
tures based thereupon are shown in Figs. 15A to 15C. 

30 However, fields mat have no bearing on their description 
are not illustrated. Two PCR are used for the prefetch 
of the indexed vector For prefetch of simple linked lists. 
PCR corresponc:ng to the number of levels thereof are 
used. (1 ): The PCR1 designates a method of prefetch- 

35 mg a vector B(n The PFA designates a leading address 
and the address modifier MOD designates a reference 
■nterval. (2). The PCR1 and PCR2 designate a method 
of prefetching an indexed vector. The PCR1 corre- 
sponds to an mcexed vector and the PCR2 corresponds 

-o \o a target vector. In particular, an LPCR field in the 
PCR2 shows that a parent PCR is of the PCR1 . (3): The 
?CR1 through PCR3 designate a method of prefetching 
a three-level linked list. MOD designated by the PCR2 
and PCR3 indicate displacements from the heads of ta- 

-5 bles C and D to 'mended data c and d respectively. 

(Prefetch request control unit 202) 

Fig. 4 shows details of a prefetch request control 
50 unit 202. Reference numeral 401 indicates sixteen sets 
of PCR ready check circuits 0 through 1 5 which are pro- 
vided so as to correspond to PCRO through PCR15 and 
check whether respective PCR are in prefetch request 
issuable ready states. The PCR ready check circuits k 
ss is supplied with ihe contents SUSP and PDRDPT of a 
prefetch status register PSR through the signal line 206. 
with VLDk. ACTk. LAk. BALk. LACk. ORGk. LPCRkand 
PDRQPk from a prefetch control register PCRk. with RC 



15 



3NSOOCIO: <EP 0723221 A2J_> 



29 



EP 0 723 221 A2 



30 



and DA fields from ail the PDSR througn the signal line 
210 and with a REQBSY from the cache request unit 
101 through the signal line 120. 

Reference numeral 402 indicates a selector, which 
selects status flags RCx(y) and DAxfy) corresponding 
to a PDSRx corresponding to a PDR of a number 
(whose value is defined as x) indicated by LPCRk and 
corresponding to an element of a number (whose value 
is defined as y) indicated by PDRQPk with respect to 
the RC and DA fields of all the PDSR. and outputs the 
selected ones. to the corresponding circuit 403 through 
a signal line 412. When the PCRk is of a linked PCR. 
the status flag DAxfy) indicates whether linked data nec- 
essary for the PCRk to generate a prefetch address is 
read into a PDRx accompanied by a parent PCR. Inci- 
dentally, the DAx(y) indicates that when an unreference 
count RCx(y) is of a non-zero, its indication is effective. 

Reference numeral 403 indicates the ready status 
check circuit for checking whether the PCRk is in a ready 
state and outputtmg a RDYk signal indicative of the re- 
sult of check therefrom. Now. symbols &. #. and < indi- 
cate a logical product, a logical sum. a logical NOT and 
a sign of inequality respectively. Conditions under which 
the RDYk is established, are represented within a block 
indicative of the circuit 403 shown in the drawing. How- 
ever, the meanings of the conditions are as follows: 

(1) The prefetch mechanism is not in the request 
stop state {SUSP is off) 

(2) The cache request unit 101 can accept the 
prefetch request (REQBSY is off) 

(3) The corresponding PCR is effective ( VLD is on) 
and in the active state (ACT is on) 

(4) When the look-ahead or prefetch control is des- 
ignated (LA is on), the number of processing re- 
quests (BAD does not reach the look-ahead count 
(LAC). On the other hand, when the look-ahead 
control is not designated (LA is off), the BAL does 
not reach the depth of PDR (PDRDPT) 

(6) When the corresponding PCR is of the origin 
PCR (ORG is on) or is not the origin PCR (ORG is 
off), the linked data necessary for the correspond- 
ing PCR to generate the prefetch address is read 
into its corresponding PDR (DAx(y) is on). 

Ready signals RDY0 through RDY15 associated 
with the respective PCR are sent to a prefetch request 
issue circuit 404 to thereby select a PCR for issuing a 
request next time from these. Reference numeral 405 
indicates a requested-PCR ID register which holds a 
PCR ID that has issued a request at the final stage and 
notifies it to the prefetch request issue circuit 404 
through a signal line 406. The prefetch request issue 
circuit 404 checks ready states of PCR in order from a 
PCR subsequent to the PCR that has issued the request 
at the final stage. The prefetch request issue circuit 404 
determines the ready PCR that has been firstly found, 
as one for issuing a request next time and outputs its ID 



as a reauest PCR ID REQPCRN. The RDY signals are 
checkea in increasing order of the PCR IDs and contin- 
ues to be checked from the 0th signal after the fifteenth 
signai. When the ready PCR exists, the prefetch request 
5 issue circuit 404 issues a prefetch request PFREQ. In 
response to the PFREQ. the prefetch request issue cir- 
cuit 404 sends a REQPCRN to the requested-PCR ID 
register 405 througn a signal line 407 to update its con- 
tents. 

w Since the aforementioned request control is per- 
formed, the program can easily and efficiently assign the 
memory access instructions to be prefetched in the loop 
to the PCR in accordance with their description order. 
The prefetch request - issue circuit 404 sends the 
'5 PFREQ and REQPCRN (whose value is defined as i) to 
the prefetch status control unit 201 through the signal 
line 205. Thereafter, the prefetch status control unit 201 
reads out the contents of PCRi designated by these so 
as to be capturea in the prefetch request control unit 202 
20 througn the signal line 206. 

Reference numeral 408 indicates a pointer update 
unit wnich controls updating of a PDRTPi ana a PDRQPi 
in resoonse to the PFREQ. Therefore, the pointer up- 
date unit 408 takes in the present PDRTPi and PDRQPi 
^ througn the signal line 206 and outputs values obtained 
by adding 1 to these with the PDRDPT as a modulus 
therefrom as updated values so as to be sent to the 
prefetch status control unit 201 through the signal line 
205. These values are written into the corresponding 
JO field of the PCRi in response to the PFREQ. 

Reference numeral 409 indicates a prefetch ad- 
dress control circuit which sends an updated value PFAi' 
of the prefetch address and an updated value SKIPCN- 
Ti' of the address skip count to the prefetch status con- 
trol unit 201 through the signal line 205. 

Reference numeral 410 indicates a PDSR update 
circuit for controlling the initialization or updating of a 
PDSR required incident to the issuance of the PFREQ. ' 
Upon issuance ci the prefetch request, the PDSR up- 
-o date circuit 410 sends information for initially setting sta- 
tus flags of a PDSRi (j) incident to a PDRi <f) for storing 
data therein to the prefetched-data unit 203 through the 
signal line 209 These signals are of initially-set values 
of PFREQ. REQPCRN. PDRTPi. NRi. PFAi and Dli(j). 
-5 Further, when the corresponding prefetch request is 
made from a linked PCR. the PDSR update circuit 410 
also sends information for updating a status flag RC of 
a PDR for storing linked data therein simultaneously 
with the above. 
so The prefetch request control unit 202 sends the 
prefetch request and information accompanied thereby 
to the cache request unit 101 through the signal line 1 07. 
As the accompanied information, may be mentioned a 
PDR number REQPCRN at which data is stored thereat 
^ and its element position PDRTPi. a prefetch address 
PFAi and a BUFi. 

The prefetch address control circuit 409 will be de- 
scribed in more detail using Fig. 5. 
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Reference numeral 501 indicates an address adder 
which sends the result of addition of input data to the 
prefetch status control unit 201 through the signal line 
205 as the updated value PFAi' of the prefetch address. 
The input data are supplied from selectors 502 and 503 
through signals lines 512 and 513 respectively. 

- The selector 502 selects one from a PFAi. an index 
X and linked data DATAp (q) in accordance with control 
lines SELX and SELL. The linked data DATAp (q) is in- 
putted from the prefetched-data unit 203 through the 
signal line 21 0. Reference numeral 504 indicates a shift- 
er which is supplied with the DATAp(q) and outputs the 
result obtained by shifting the data in a high-order bit 
direction by the number of bits designated by a prefetch 
length DLi as the index X. Reference numeral 505 indi- 
cates a control signal generation circuit for the selector 
502. which is inputted with an ORGi and an LLi to gen- 
erate the SELX and SELL. The SELX represents a se- 
lect instruction for the index X and is on when a target 
vector relative to an indexed vector is prefetched. Fur- 
ther, the SELL shows a select instruction for the linked 
data DATAp(q) and is on when data of a second level or 
less in a linked list is prefetched. 

The selector 503 selects either one of a MODi and 
a SKIPGAPi in accordance with a control line SELSKIP 
When a SKIPCNTi reaches a SKIPPi upon designating 
the address skip (when SKlPi is on), the SELSKIP is on. 
Therefore, a comparator 506 compares the SKIPCNTi 
and the SKIPPi and sends the result of comparison to 
an AND circuit 509 through a signal line 507. The AND 
circuit 509 brings the output SELSKIP to on when they 
coincide with each other from the result of comparison 
and the SKlPi is on. Reference numeral 508 indicates 
an incrementer for adding 1 to the SKIPCNTi with the 
SKIPPi as a modulus, which sends an updated value of 
the SKIPCNTi to the prefetch status control unit 201. 
The prefetch status control unit 201 allows the PCRi to 
take in the updated value in response to the PFREQ. 

It is possible to generate an updated value of the 
prefetch address according to the prefetch request and 
capture it to the PCRi in response to the PFREQ under 
the above control. 

The PDSR update circuit will be described in detail 
with reference to Fig. 6. When a PFREQ is issued from 
the origin PCR (ORGi is on), status information RC. Rl. 
DA. Dl and MDKEY incident to the PDSRi(j) are initially 
set. Namely, the number of references NRi is set to the 
RCi(j) and 1. 0 and 0 are respectively set to the Rlifj). 
DAifj) and Dlifj). In the present embodiment, low-order 
16 bits of the PFAi are set to an MDKEYi(j). Therefore, 
the PDSR update circuit 410 captures the PFREQ and 
REQPCRN from the prefetch request issue circuit 404 
through a signal line 411 and takes in the PDRTPi. NRi 
and PFAi from the prefetch status control unit 201 
through the signal line 206. The PDSR update circuit 
410 sends these to the prefetched-data unit 203 through 
the signal line 209 as they are. 

When the PFREQ is issued from the linked PCR 



CRGi is oti) it is necessary to uodate the status infor- 
mation RC belonged with the PDSR for storing linked 
cata therein isuotract l therefrom) Therefore, the PD- 
SR update circuit 410 takes in the ORGi through the sig- 
* nst line 206. The PDSR update circuit 410 also gener- 
ates a RC upaate instruction DECRC signal using 601 
ana outputs it to the signal line 209. Further, the PDSR 
•jDaate circuit 410 reads out a number LPCRi (regarded 
as - p) of the corresponding PDSR and an element po- 
io sit ion PDRQPi (regarded as = q) through the stgnal line 
206 and sends them to the prefetched-data unit 203. 

A selector 602 selects the contents Dip of a number 
■naicated by the LPCRi from Dl fields of all the PDSR 
ana supplies it to a selector 603. The selector 603 se- 
^ iects the contents Dlp(q) of an element designated by 
■he PDRQPi from the contents Dip and supplies it to an 
invalid flag generation circuit 605. The invalid flag gen- 
eration circuit 605 generates an initially-set value of the 
Dlifj) at the time of the issuance of the PFREQ from the 
i;r,Ked PCR A condition for generating the value by the 
•nvahd flag generation circuit 605 means that when PDR 
rata necessary to generate an address is invalid (Dip 
-i is on) uoon issuance of a linked request, there is the 
sciential mat the generated aadress and data read out 
2S _s.ng the address are invalid, thereby making it neces- 
sary to set the Dl Mag of the corresponding PDSR on 
• -nvaiidV The initially-set value is sent to the prefetched- 
cata unit together with the PFREQ and is set to the Dli(j). 

:*o , Prefetched-data unit 203) 

The prefetched-data unit 203 will be described in 
more detail with reference to Fig. 7. The prefetched-data 
-•nit is provided with sixteen sets of prefetched-data cir- 

^ cuits 701 corresponding to the PCR0 through 15. Each 
oi the prefetched-data circuits is provided with the 
prefetched-data register (PDR) 704 composed of a plu- 
rality of data register elements each storing 8-byte data 
serein. The number of data register elements forming 

-o a single PDR is represented by a PDRDPT field of the 
corresponding PSR. There is a case in which the entire 
sixteen prefetched-data registers are called PDR. Data 
CD ATA reao out from the cache memory 1001 is stored 
n its corresponding PDR through the signal line 108. A 
data delivery instruction CADV. a PDR number CWP- 
DRN and a storage element position CPDRIP for that 
purpose are supplied from the cache request unit 101 
through the signal line 120. Data MDATA read out from 
:he memory 1 is stored in the corresponding PDR 

=o -hrough the signal line 109. A data delivery instruction 
-MADV. a PDR number MWPDRN and a storage ele- 
ment position MPDRIP for this purpose are supplied 
'fom the memory request unit 103 through the signal line 
121 

?j In order to read out data from the corresponding 
PDR upon the execution of an instruction by the proc- 
essor 2. a PDRREQ. a RPDRN and a PDROPm are sup- 
plied from the prefetched-data read control unit 204. 
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Each individual PDR supplies an element to a selector 
710 through a signal line 709. The selector 710 selects 
it in response to the PDROPm (regarded as = n) and 
supplies the result of selection to a selector 71 3 through 
a signal line 712. The selector 713 selects data from a 
desired PDR based on the RPDRN (regarded as = m). 
ThiSDutput data DATAm(n) is sent to the cache data unit 
102 through the signal line 110. 

In order to read linked data from the corresponding 
PDR in response to the issuance of a prefetch request 
from the linked PCR. the LPCRi and PDRQPi are sup- 
plied from the prefetch request control unit 202 through 
the signai line 209. Each individual PDR supplies an el- 
ement to a selector 711 through the signal line 709. The 
selector 711 selects it in response to the PDRQPi (= q) 
and supplies it to a selector 715 through a signal line 
714. The selector 715 selects data from a desired PDR 
based on the LPCRi (= p). This output DATAp(q) is sent 
to the prefetch request control unit 202 through the sig- 
nal line 21 0. 

Reference numeral 705 indicates a prefetched-data 
status register PDSR for holding status information cor- 
responding to respective elements forming PDR there- 
in. The meaning of the respective status information 
held in the PDSR. their initial setting and thetr update 
control will be described below. 

O Unreference count: RC 

The RC indicates the number of references that are 
not yet performed, of the number of references NR to 
be made to the corresponding element in a PDR. An 
element indicative of the fact that the RC becomes 0. is 
already used and is usable for the storage of data to be 
read out next time. When a prefetch request PFREQ 
with respect to the corresponding element using a PCRi 
is issued, the RC is used for initially setting the value of 
a reference number NRi field in the PCRi. Therefore, a 
PFREQ. a REQPCRN. a PDRTPi and an NRi are input- 
ted to the present field of the PDSR 705. They are dec- 
remented each time data is transferred to the processor 
2 and reference is made thereto in response to the is- 
suance of a linked request. In order to perform data up- 
date at the time of the issuance of the linked request, a 
DECRC. an LPCRi and a PDRQPi are inputted to the 
PDSR 705. Further, a PDRREQ. a RPDRN and a 
PDROPm are inputted to the PDSR 705 to perform data 
update at the time of the transfer of the data to the proc- 
essor 2. 

Respective Rl. DA. Dl and MDKEY flags to be de- 
scribed below make representations valid only when the 
RC is of a non-zero. 

O Request issued flag: Rl 

The Rl shows that a prefetch request for storing da- 
ta in the corresponding element has been already is- 
sued and data do not yet arrive. The present flag is on 



uDon issuance of the PFREQ ana is off when the data 
arrives. Therefore, the present flag of the PDSR 705 is 
inputted with a PFREQ. a REQPCRN and a PDRTPi for 
coping with the time when the request is issued, a 

5 CADV. a CWPDRN and a CPDRIP for coping with the 
time when the CD ATA arrives, and an MADV. an MWP- 
DRN and an MPDRIP for coping with the time when the 
MDATA arrives, respectively. Since on and off circuits 
for the respective flags can be easily configured, their 

'<o description wiil be omitted. Flags and fields to be de- 
scribed later will be not described in the same manner 

O Data arrived flag: DA 

'5 The DA represents that the corresponding data is 
in an already arrived and held state. The present flag is 
off when the PFREQ is issued, whereas it is on upon 
arrival of the data. Therefore, the present flag of the PD- 
SR 705 is inputted with the PFREQ. REQPCRN and 

io PDRTPi for coping with the time when the request is is- 
sued, the CADV. CWPDRN and CPDRIP for coping with 
the time when the CD ATA arrives, and the MADV. MWP- 
DRN and MPDRIP for coping with the time'when the 
MDATA arrive, respectively. 

C Data invalid flag. Dl 

The Dl shows whether data read out and held into 
the corresponding PDR element is valid. Data held in 
30 individual elements of the PDR are regarded as invalid 
when there is the potential that data at the correspond- 
ing memory address is brought up to date or renewed 
until the execution of a memory access instruction. Fur- 
ther, the data is also regarded as invalid even when an 
access exception is detected upon the corresponding 
data reference. 

The initial setting of the present flag at the time of 
the issuance of the PFREQ differ according to whether 
the corresponding PCR is of the origin PCR or the linked 
-o PCR. When it is of the origin PCR. the present flag is 
initially set to 0. When it is of the linked PCR. a data 
invalid flag Dlptq) for linked data is initially set. This is 
because since there is a possibility that an address to 
be generated is also invalid when the linked data is 
-s invalid, data based on the address should be regarded 
as invalid. Therefore, an initially-set value Dli(j)' of the 
present flag annexed to the PFREQ is generated from 
the PDSR update circuit 410 and taken in the PDSR 
through the signal line 209 as described in Fig. 6. The 
bo PFREQ. REQPCRN and PDRTPi are inputted to the 
PDSR to designate an update time, a PDSR number 
and an element position. 

When the access exception is detected by the 
cache request unit 101. the cache request unit 101 
ss sends the access exception EXP along with the CADV 
signal through the signal line 120 and sets the present 
flag. The CADV. CWPDRN and CPDRIP are inputted to 
designate the update time, the PDSR number and the 
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element oosmon. 

As wiii be described tn the following MDKEY. the 
present flag is set when there is a possibility that 
prefetched data has been brought up to date in accord- 
ance with the store instruction executed by the proces- 
sor 2. 

- where the data in the PDR is found to be invalid 
when the memory access instruction that requests the 
prefetched data, is executed, the corresponding data is 
neglected and a memory request based on the memory 
access instruction is made active to read out data from 
the cache or memory 1 as usual. This operation ts called 
prefetch buffer redirection PBR in the present specifica- 
tion but will be described in further detail. 

O Update detection key: MDKEY 

The MDKEY is of a key for detecting the presence 
or absence of updating of an address of each prefetched 
data. The MDKEY sets prefetch addresses or some 
thereof upon issuance of the PFREQ. Therefore, the 
PFREQ. REQPCRN. PDRTPi and PFAi are inputted to 
the present field. 

Reference numeral 706 indicates four sets of com- 
parators for detecting the updating, which are provided 
every PDR elements. The comparator compares the op- 
erand address PRADR sent from the processor 2 and 
the value of the MDKEY supplied through a signal line 
707 in response to the store instruction execution com- 
mand ST sent through the signal line 106 from the proc- 
essor 2. When they are found to coincide with each other 
from the result of comparison, the corresponding PDR 
data is regarded as invalid and the Dl flag of the corre- 
sponding element is set on (invalid) through a signal line 
708. In the embodiment in which the MDKEY is repre- 
sented as some of the prefetch addresses, only the cor- 
responding portion of the addresses is compared there- 
with. At this time, a comparison is established even 
when the address does not coincide with the above val- 
ue in practice. In this case, however, no problem arises 
from the viewpoint of assurance of the proper operation 
of a program since the memory access instruction reads 
out data under the PBR operation. 

Reference numeral 7 1 7 indicates a circuit for ORing 
all the Rl flags. The circuit sends an output PFBSY to 
the prefetch status control unit 201 through the signal 
line 21 3. The PFBSY shows that a request is issued but 
no data arrived exists. A program refers to it to confirm 
whether the prefetch operation is brought into a pause 
state upon generation of an interrupt. 

Each prefetched-data circuit sends the RC. DA and 
Dl flags of all the PDSR to the prefetch request control 
unit 202 through the signal line 210 and transmits the 
RC. Rl. DA and Dl to the prefetched-data read control 
unit 204 thereto through the signal line 211. 

Fig. 12 shows a state transition diagram which re- 
lates to each element of a PDR and is indicated by the 
four types of status flags RC. Rl. DA and Dl. 



r: g. 8 inaicates the prefetched-data reaa control 
unit 204 for reading data from each PDR and controlling 
the transfer of the read data to the processor 2 and tne 
tike 

5 "he prefetched-data read control unit 204 takes m 
•the memory request PRREQ. the memory access in- 
struction decode information LD. the base register 
numoer BRN of the memory access instruction and the 
masK register value MK indicative of the execution con- 
io dition for the memory access instruction in response to 
the execution of the memory access instruction from the 
processor 2 through the signal line 106. When the MK 
ts off. the corresponding instruction should be made 
invalid from the viewpoint of the program. However. 
'5 processing is performed as if the MK is on inside the 
apparatus and only the writing of its result will be sup- 
pressed. Thus, the prefetched-data read control unit 
204 can capture information about this instruction as 
weil as the PRREQ regardless of the value of MK. Ref- 
-20 erence numeral =09 indicates a register for storing these 
iaken-m information therein. When the PRREQ that re- 
cuests the corresponding data, is issued before the 
oretetcned data arrives, the register 809 holds these in- 
formation therein until the arrival of data. Therefore, the 
2S PDR WAIT signal to be described later is inputted to the 
register 809 as a hold condition. 

Reference numeral 801 indicates sixteen sets of 
comparators corresponding to each PCR. The compa- 
rators respectively compare respective IID fields thereof 
so and the BRN supplied from the register 809 and respec- 
tively send the results of comparison to an encoder 803 
and an OR circuit 310 through signal lines 802. Inciden- 
tally, a comparison is made to valid PCR (VLD is on) 
alone The result of comparison about each ineffective 
PCR is constructed so as to insure 0. 

The encoder 803 encodes a number of a PCR at 
which the coincidence is obtained from the results of 
comparison and outputs it therefrom as a RPDRN sig- 
nal. 

-o The output of the OR circuit 81 0 is of a MATCH sig- 
nal indicative of the fact that each valid PCR associated 
with the corresponding memory access instruction ex- 
ists. 

A selector 804 is supplied with the RC. Rl. DA and 
~s Dl fieias of all the PDSR from the prefetched-data unit 
203 and selects the contents corresponding to a PDSR 
designated by the RPDRN signal at each field. RCm. 
Rim. DAm and Dim outputs produced from the selector 
804 are inputted to a selector 805 from which the con- 
50 tents corresponding to each element of a PDSRm des- 
ignated by the PDROPm sent through the signal line 206 
are selected at each field. 

An incrementer 806 is supplied with the PDROPm. 
adds 1 to the PDROPm with a PDRDPT as a modulus 
55 and outputs the result of addition as an updated value 
PDROPm*. 

Adecrementer 611 is inputted with the RDCm signal 
sent through the signal line 205 and outputs an updated 
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value RDCrrT obtained by subtracting 1 therefrom. 
When, however, the RDCm is already 0. it will remain at 
0 

A PDR read request control circuit 506 is inputted 
with a PRREQ. an LD and an MK from the register 809. 
a MATCH from the OR circuit 510. a RCm(n). a Rlm(n). 
a DAm(n) and a Dlm(n) and a mask override designation 
MOEm from the signal line 205. The PDR read request 
control circuit 506 generates a PDR read request PDR- 
REQ. a PDR hit signal PDRHIT and a PDR read wait 
signal PDRWAIT. 

Conditions for generating the respective signals are 
represented as shown in the drawing. However, a de- 
scription will be slightly made of their meanings. The 
condition for generating the PDRREQ will be considered 
in three parts. 

(1 ) First condition 

The present condition is that a memory request 
based on a memory access instruction is issued from 
the processor 2 (PRREQ & LD are established). In this 
case, the memory access instruction should be either 
one of its execution being allowed (mask MK being on> 
and a designation that allows the designation of a mask 
to be neglected (MOEm being on). 

(2) Second condition 

The present condition is that a valid PCR for 
prefetching data under the corresponding memory ac- 
cess instruction is designated (MATCH is on). 

(3) Third condition 

The present condition is that data has been already 
read into a PDRm<n)((RCm(m - 0 & DAm(n) is estab- 
lished) or the reading of data for effecting the prefetch 
buffer redirection is unnecessary (RDCm - 0). 

In the case of the conaition for generating the 
PDRHIT only the third condition for the PDRREQ differs 
in the following manner. Namely, the present condition 
is that data valid for the PDRmm) has been already read 
((RCmfn) - 0) & DAm(n) & A Dlm(n) are established) and 
it is necessary to read data ( RDCm = 0). 

In the case of the condition for generating the PDR- 
WAIT only the third condition for the PDRREQ differs in 
the following manner. Namely, the present condition is 
that a request is not yet issued (RCm(n) = 0) or no data 
arrives (Rlm(n) is on) regardless of the fact that the read- 
ing of data into the PDRm(n) is necessary (RDCm - 0). 

The RPDRN. PDROPm and PDRREQ of the sig- 
nals generated or captured by the prefetched-data read 
control unit 204 as described above are sent to the 
prefetched-data unit 203 through the signal line 21 1 . the 
PDROPm 1 . RDCm' and RPDRN of them are sent to the 
prefetch status control unit 201 through the signal line 
207. the PDRHIT of them are respectively sent to the 



cacne memory unit 3 ana the cache data unit 102 
througn the signal lines 111 ana 110. the PDRHIT and 
PDRWAIT of them are sent to the cache request unit 
101 through the signal line 107. and the PDRWAIT of 
* them is sent to the processor 2 through the signal line 
112 

<Detatls of circuit units related to prefetch unit> 

io .-ig. 9 shows the configuration of the cache request 
unit 101 . The cache request unit 101 is inputted with the 
PFREQ. PDRHIT PDRWAIT. PFAi. REQPCRN. PDRT- 
Pi ana BUFi through the signal line 107. Further, the 
cache request unit 101 is supplied with the PRREQ and 

is PRADR from the processor 2 through the signal line 
122. 

A register 905 is inputted with the prefetch request 
PFREQ and the PFAi. REQPCRN. PDRTPi and BUFi 
signals corresponding to its accompanying information. 
20 The register 905 holds the request and its accompany- 
ing information therein while the REQBSY to be de- 
scrioed later is on and thereafter outputs same to a sig- 
nal tine 915. A register 909 is supplied with the PDRHIT 
signal and outouts it to a signal line 91 6 as it is. A register 
25 9 1 0 is inputtea with the PRREQ. PRADR and PDRWAIT. 
While the PDRWAIT is on. the register 910 holds the 
request and address therein and thereafter outputs 
same to a signal line 917. 

Reference numeral 901 indicates a cache request 
20 control circuit. When memory requests are simultane- 
ously issued, the cache request control circuit 901 as- 
signs priorities to them. When the data is already read 
out by the prefetch unit, the cache request control circuit 
90 1 suppresses an access to the cache memory 1 001 . 
25 Therefore, the cache request control unit 901 is inputted 
witn the PFREQ through the signal line 91 5. the PRREQ 
■.hrough the signal line 917 and the PDRHIT through the 
signal line 916 The PDRHIT shows that the data valid 
or effective tor the memory request PRREQ from the 
~o processor 2 has oeen already read out by the prefetch 
unit In the present embodiment, the cache request con- 
trol circuit gives priority to the PRREQ when the PFREQ 
ana PPREQ are simultaneously issued. From the 
above, the request REQ signal to be sent to the cache 
-5 memory 1001 or the memory 1 is generated when either 
one of the following two conditions is established. 

1 ) PRREQ & A PDRHIT 

2) A PRREQ & PFREQ 

50 

The REQ signal is sent to a cache access control 
unit 903 through a signal line 914. 

When the PFREQ and PRREQ are simultaneously 
issued, the cache request control circuit 901 places 
55 PFREQ processing in a wait state. Further, the cache 
request control circuit 901 generates a REQBSY signal 
for putting the issuance of a subsequent prefetch re- 
quest on hold and sends it to the prefetch unit 105 
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through the signal line 120. 

A selector 902 selects an address in response to 
the selected request. Therefore, the selector 902 is sup- 
plied with the PFAi through the signal line 915 and the 
PRADR and PRREQ through the signal line 917. When 
the PRREQ is on. the selector 902 selects the PRADR 
and outputs it as an ADR signal therefrom. When the 
PRREQ is off. the selector 902 selects the PFAi and out- 
puts it as the ADR signal therefrom. 

Reference numeral 903 indicates the cache access 
control unit, which is provided with an address transla- 
tion buffer mechanism (TLB) 904 for converting a virtual 
address into a real address, an address array 905 for 
holding addresses of data blocks registered in the cache 
memory 1 001 . a cache hit check circuit 906 for checking 
a cache hit from the result of retrieval by the TLB and 
the address array, and a request order control circuit 907 
for holding and managing information affixed to a re- 
quest. Since these are basically constructed under the 
application of the prior art to these, their detailed de- 
scription will be omitted. The cache access control unit 
903 is inputted with the REQ through the signal line 914 
and the REQPCRN. PDRTPi and BUFi through the sig- 
nal line 915. Further, the cache access control unit 903 
is inputted with the ADR from the selector 902. 

The cache access control unit 903 is responsive to 
the REQ given from the cache request control circuit 90 1 
to thereby allow the TLB904 to convert an ADR indica- 
tive of a virtual address into a real address MADR. At 
this time, the TLB904 simultaneously checks the stor- 
age protection information registered in the TLB and 
generates an EXP signal when an access exception is 
detected. Next, the address array 905 is retrieved using 
the real address and the result of retrieval is checked by 
the cache hit check circuit 906. If the corresponding ad- 
dress is found to be registered, then the cache hit check 
circuit 906 judges the result of retrieval to be a cache 
hit. Further, the cache hit check circuit 906 generates a 
cache address CADR by the conventional method and 
sends it to the cache memory unit 3 through the signal 
line 113. Furthermore . the cache hit check circuit 906 
generates a data delivery instruction CADV. If the cor- 
responding address is found not to be registered, the 
cache hit check circuit 906 judges the result to be a 
cache miss and generates a memory request MREQ. 
The request order control circuit 907 sends the real ad- 
dress MADR to the memory request unit 103 through 
the signal line 1 1 5. When the prefetch buffer designation 
BUFi is on. the request order control circuit 907 sends 
a line transfer request LT incidentally to it. 

The request order control circuit 907 takes in the 
REQPCRN and PDRTPi through the signal line 9 1 5. Up- 
on the cache hit. the request order control circuit 907 
sends both to the prefetch unit 1 05 as the CWPRRN and 
CPDRIP through the signal line 120. Upon the cache 
miss, the request order control circuit 907 sends both to 
the memory request unit 103 as the MWPDRN and MP- 
DRIP through the signal line 115. 



- g. 10 shews the details of the cache memory unit 
3. The cache memory unit 3 captures the CADR from 
the cache request unit 101 through the signal line n3 
ana takes in caia to be written into the cache and 

s prefetched data from the cache data unit 102 through 
the signal line -'4. Reference numeral 1001 indicates 
the cache memory which performs an access using the 
adoress CADR. "he data to be written into the cache is 
inputted to the cache memory 1001 through the signal 

;0 line 114. A selector 1003 is inputted with the data read 
from the cache memory 1001 and the data on the signal 
line 114 througn a signal line 1002. The selector 1003 
takes in the PDRHIT signal from the prefetch unit 105 
through the signal line m and selects data in response 

'5 to tne signal. When the PDRHIT is on. the valid 
prefetched data read from the PDR is already sent to 
the signal line 1 1 4 and is selected by the selector 1 003. 
The selector 1C03 sends it to the processor 2 through 
the signal line ICE. This brings about an advantageous 

^0 eiiec: ior cost reduction because other data line be- 
comes unnecessary upon construction of the processor 
2. 

<Sucoiementary description of prefetch control> 

25 

A supplementary description will hereinafter be 
maae of a prefetch function and operation and a method 
of executing the orogram. and the like with points unil- 
iustrated in the aoove description as the center. 

30 

01 Type-inherent prefetch control 

A method of initially setting each PCR and prefetch 
control will be described below at each data structure 
type 

1 ) Sequential vector 

A prefetch ; 3r the sequential vector is designated 
-o by a single PCR initially-set information based on the 
program are as fcilows: The setting of fields with no de- 
sc notion is unnecessary and hence this makes it useful 
in a reduction n set overhead. A description will be 
maae below of examples of initially setting PCR based 
-5 on a program example 1 shown in Figs. 22A through 25. 
A PCR0 will be used herein. 

ACT=1: staa up prefetch operation 
ORG=1: designate PCR as origin PCR 

so - MOD=(5V stride is 6B 

DL=(3): operand length is SB 
3UF=0: oniy PDR is used as prefetch buffer 
!ID=(GR12j. 12 is designated as base register 
number of Load instruction 

ss - lA=i : designate existence of look-ahead control 
LAC=(20). look ahead to 20 elements 
5KIP=0: presence of address skip 
PDRTP=0. designate leading element of PDR 
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BAL=0. (value identical to PDRTP) 
NR=1: only memory access instruction accesses 
prefetched data 

MOE=0: mask override designation is unnecessary 
PFA=(&B(1 )): initial element address of vector B (in- 
itial value of GR12) 

2) Prefetch control on indexed vector 

The prefetch of the indexed vector is designated by 
the origin PCR for designating the control on the reading 
of an index and the linked PCR for controlling the read- 
ing of a target (normally represented as plural). A de- 
scription will be made below of examples of initially set- 
ting PCR in a program example 2 shown in Figs. 26A 
through 29 and examples of initially setting PCR in a 
program example 3 shown in Figs. 30A through 33. 

initialization of origin PCR> PCRO is used 

ACT=1 : start up prefetch operation 

- ORG=1 : designate PCR as origin PCR for reading 
indexed vector 
MOD=(S): stride is 5B 

DL=(3). operand length of indexed vector is 3B 
BUF=0: only PDR is used as prefetch buffer 
!ID=(GR10): 10 is designated as base register 
number of Load instruction 

- LA=1 : designate existence of look-ahead control 
LAC=(5): look ahead to 5 elements 
SKIP=0: absence of address skip 
PDRTP^O: designate leading element of PDR 
BAL=0: (value identical to PDRTP) 

. NR=2: refer to corresponding PDR to generate 
prefetch addresses for Load instruction and Fload 
instruction 

MOE=0: mask override designation is unnecessary 
PFA=(&L(1)): initial element address of indexed 
vector L (initial value of GR10) 

initialization of linked PCR> PCR1 is used 

ACT=1 : start up prefetch operation 
- ORG=0: linked PCR for reading target vector B 

MOD=(&B(1 )): base address of target vector B 

DL=(3): operand length of target vector B is 8B 
. BUF=1: both PDR and cache are designated as 

prefetch buffers 

IID=(GR12): 12 is designated as base register 
number of Fload instruction 
LA-0: designate absence oi look-ahead control re- 
quest is unnecessary because it belongs to reading 
of data in parent PCR 
SKIP=0: absence of address skip 
PDRTP=0: (program example 2)/ 1 (program exam- 
ple 3) 

BAL=0: (value identical to PDRTP) 
NR=1: Fload instruction refers to corresponding 



!UOE=0: (prccram example 2/1 (program example 

- LL=0: indicate :nat PCR is linked PCR for indexed 
5 vector 

L?CR=(PCRC designate origin PCR number 
cnRQp-o. .centtcal to PDRTP of origin PCR 
PFA: no neec 'or initialization 

lO Since the ccrtents of PDR are empty-read in ac- 
cordance with a F-oad instruction (position (7.3) in Fig. 
32) in a prologue s:age. whose execution is suppressed 
by the Designation of the mask override in the case of 
the program exarroie 3. it is noted that the PDRTP of 
is the ImKed PCR is -mally set to 1 . 

Each of reaa scdresses for the target vector is gen- 
erated by adding a value obtained by shifting a read in- 
dexea-vector element by the number of bits designated 
by DL -o a modifier for the linked PCR. The generated 
20 address is held in a PFA field of the linked PCR and used 
for a prefetch rec-est. 

3) Prefetch contrc: on linked lists 

5 "he prefetcn z: the linked lists is designated by two 
or more PCR for controlling the reading of data in a plu- 
rality of multi-leve; :ables linked with the highest sequen- 
tial vector and its e ements as the starting points. Initial- 
ly-set information cased on a program are as follows: 

<Origin PCR> 

■ ACT=[0/1]: v.-en ACT=1. start up prefetch opera- 
tion 

35 - CRG=1 : designate PCR as origin PCR for reading 
indexed vectc: 
MOD=(stnde 

- DL=(designa-e operand length of indexed vector) 
3UF=[0/1]: f?2R/cache] 

jo - .|0=(base rec ster number) 

LA=[0/1]: des.cnate presence or absence of look- 
anead contrc: 

- SK1P=[0/1]: v.nen SKIP=1. need designation of 
3KIPP and S-vPGAP 

45 . PDRTP: when MOE=0 and MOE=1 . designate the 
number of emcty readings of PDR in prologue stage 
BAL=0: (value identical to PDRTP) 
NR=[2/3/...]: cesignate the number of accesses to 
PDR with tao:e elements of highest levels stored 
so therein 

MOE=[0/1 ]: ~ask override designation 
PFA=(base asaress): initial element address of in- 
dexed vector 

55 initialization of i:r.ked PCR> 

- ACT=[0/1]: wren ACT=1 . start up prefetch opera- 
tion 
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ORG=0; designate PCR as linked PCR for reading 
low-level list 

MOD=(displacement): displacement of next level 
relative to linked data in table 
OL=(designate length of linked data) 

- BUF=0/1: [PDR/cachej 

- - IID=(base register number) 

LA=0: designate absence of look-ahead control (re- 
quest is unnecessary because it belongs to reading 
of data in parent PCR) 
SKIP=0: skip is unnecessary 
PDRTP: when MOE=0 and MOE=l . the number of 
executions of memory access instruction in pro- 
logue stage 

- BAL=(value identical to PDRTP) 

RN=p/2/...]: designate 1 in the case of PCP of the 

lowest level and designate the number of accesses 

to PDR storing corresponding data therein in the 

case of other levels 

MOE=(0/1]: mask override designation 

LL=1 : designate PCP as being linked PCR in linked 

lists 

LPCR=<parent PCR number) 

- PDRQP-ddentical to PDRTP of parent PCRi 
PFA: initialization is unnecessary 

Each of read addresses of data to. be generated by 
a baby PCR is obtained by adding a modifier of the baby 
PCR to a high element (linked data) read by a parent 
PCR. The generated address is held in a PFA field of 
the baby PCR and used for a prefetch request. The con- 
trol on the issuance of the prefetch request by the baby 
PCR is similar to that on the prefetch for the indexed 
vector. 

(2) Address skip function 

The present embodiment includes an address skip 
function. The address skip function is effective for ap- 
plying the prefetch to a partial reference to a multidimen- 
sional array. 

There may be a case where since elements on the 
periphery of the array provides a boundary condition up- 
on numerical calculations using the multidimensional ar- 
ray they are not used for principal calculations. There 
may,also be a case where an array added with inten- 
tionally-unaccessed elements is defined to avoid com- 
petition between memory banks. Figs. 16A and 16B 
show an exampte in which a partial access to a two-di- 
mensional array is performed. In this case, an array A 
(100. 200) with 100 elements in column and 200 ele- 
ments in row is read out on a column priority basis. In 
this program, a portion of A (1 -99. 1 -1 99) surrounded by 
a thick frame is accessed. Namely, it is necessary to 
continuously perform the prefetch from the top row ele- 
ment of the next column without accessing the final row 
element on each column upon data prefetch. Memory 
regions blocked by contiguous 99 elements among 



wntch intervals are definea one element by one element, 
are usea as actuaily-accessedones. When the prefetch 
>s started up every times at each coiumn. overheads are 
developed, thereby causing the risk that the speeding 

5 up cannot be sufficiently achieved. Since the blocked 
memory regions can be read out at each prefetch start- 
up even in this case, the present address skip function 
is effective for a reduction in overheads. 

The address skip function can bring about a great 

io effect where the entire capacity of data is large to the 
order of capacity exceeding that of the cache and a 
prefetch start-up time cannot be neglected as compared 
with an execution time of the innermost loop. When an 
execution cycle of the innermost loop is 50 cycles per 

*5 iteration ana an overhead of 25 cycles acts on the star- 
tup of the prefetch in the case of a three-dimensional 
array of 100X100X100. for example, the present func- 
tion can provide the prospect of speedups of about 50%. 

?o t2) interrupt processes 

A detaiied description will be made below of proc- 
esses basea on hardware and software at the time that 
an mterruDt is triggered during a prefetch operation. The 
33 interrupt processes related to the prefetch operation dif- 
fer according to the type of exception event indicative of 
:he cause thereof. The type of exception event is clas- 
sified into four as follows: 

so O Recoverable exceptions detected during prefetch 
operation 

Example: a oage fault, a TLB misfault at prefetch 

--5 o Unrecoverable exceptions detected during prefetch 
operation 

Example: a data memory protection check, a non-align 
cata access trap at prefetch 

-0 

O Recoverable exceptions detected independent of 
prefetch operation 

Example: a page fault and a TLB misfault detected by 
-5 memory accesses according toan external interruptand 
an instruction execution 

O Unrecoverable exceptions detected independent of 
prefetch operation 

50 

Example: a high priority machine check, a memory 
protection check detected by a memory access 
according to an instruction execution 

35 When the type of exception event indicative of the 
cause of an interrupt is of a recoverable one. the 
prefetch can continue after its resumption. In the present 
embodiment in this case, only necessary prefetch con- 
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troi information is saved and recovered and an overnead 
is reduced without saving and recovering data in a 
prefetched-data register. 

Outlines of processes related to a prefetch at the 
'time that an interrupt is generated in the information 
processing apparatus illustrative of the present mven- 
tion.-will be described below. 

i ) -Detection of exception 

An access exception EXP incident to memory read- 
ing for the prefetch is detected based on the indexes of 
the TLB 904 in the cache request unit 101. 

The detection of the access exception is required 
upon execution of the corresponding memory access in- 
struction from the processor 2. This is because since 
the timing for executing and suppressing a subsequent 
instruction at the time of the detection of the exception 
is critical, there may be a case where it is difficult to use 
the result of detection obtained according to the request 
processing at the prefetch. Thus, the prefetch unit does 
not report the exception to the processor 2. The detec- 
tion of the access exception by the processor 2 is per- 
formed based on the indexing of a TLB (not shown) in 
the processor 2 as usual. 

2) Operation at the time of exception detection 

When an exception EXP is detected incidentally to 
a prefetch request, a data invalid flag Dl attached to a 
PDR element corresponding to the corresponding re- 
quest is set on to indicate the invalidity of data. Even in 
this case, the subsequent data prefetch continues. The 
reasons of this continuation are as follows: 

O Since there is the potential that the execution of the 
corresponding memory access instruction is sup- 
pressed under conditional execution control, the 
corresponding exception is not necessarily detect- 
ed. 

O There may be cases where no exceptions are pro- 
duced at the subsequent addresses when the sub- 
sequent data addresses are quite separate in the 
case of. for example, a non-contiguous sequential 
vector, an indexed vector and a prefetch of a linked 
list. 

O When the prefetch made to the subsequent data is 
stopped, the memory access instruction is kept 
waiting for prefetched data so that backup can be 
generated. 

3) On-interrupt operation 

When an exception is detected upon execution of 
the corresponding memory access instruction, the exe- 
cution of the corresponding instruction is suppressed 



and an on-interrupt operation basea on hardware is ex- 
ecuteo. At this time, the processor 2 reports the occur- 
rence of the interrupt to the prefetch unit 105 using the 
INT signal. The prefetch unit 105 sets the SUSP flag of 
5 the corresconding PSR. Thus, all the prefetch opera- 
tions stoo the issuance of the subsequent requests and 
waits for aata arrival completion relative to the already 
issued request. When all the data arrive, the PFBSY flag 
of the PSR is off under the hardware operation. This 
»o state of the prefetch mechanism is called a pause state. 
The processor 2 reads a PSR at the head of the interrupt 
routine and confirms that the prefetch mechanism is in 
the pause state, followed by execution of the subse- 
quent interrupt routine. When the contents of a PSR. a 
is PCR. a FDR and a PDSR. which are indicative of 
prefetch information, are saved upon a recoverable in- 
terrupt while the interrupt is being served, namely, when 
the prefetcn mechanism illustrative of the present em- 
bodiment is not used at all until the program returns to 
20 the original program, it is unnecessary to save these in 
accordance with the interrupt routine. Thus, processing 
overheaas necessary for the corresponding process are 
not procuced. 

When there is no assurance that the corresponding 
25 prefetch information are saved upon the recoverable in- 
terrupt wniie the interrupt is being served, the interrupt 
routine needs saving of only the PSR and PCR in the 
present embodiment. The saving/recovery is executed 
in accordance with the instruction. Since the re-reading 
so of the PDR is performed after its reset from the interrupt 
even in this case, the saving of the PDR is unnecessary. 
While the saving of the PDSR is also unnecessary, it is 
necessary to initialize status information as will be de- 
scribed subsequently. 
35 Upon the recoverable ■■<. -.errupt. the prefetch opera- 
tion is made invalid by resetting the valid flags VLD of 
all the PCR in accordance with the interrupt routine. In 
this case, the saving of various resources related to the 
prefetch ;s unnecessary. 

-o 

4) On-reset operation from interrupt 

The interrupt routine effects recovery processing on 
a PSR and a valid PCR upon resetting of the program 
45 from an interrupt being unable to assure the storage of 
the prefetch information. During that time, the SUSP flag 
is set to ensure that the prefetch operation does not run 
forward with the PCR information while being recov- 
ered. When the storage of the prefetch information is 
50 ensured while the interrupt is being served, the corre- 
sponding recovery operation is unnecessary. 

The SUSP flag is reset by executing an instruction 
immediately before the returning of the interrupt routine 
' to the original program, Thus, the prefetch mechanism 
55 is brought into an operating state according to the des- 
ignation of the recovered PCR so that the prefetch op- 
eration is resumed. By executing a return instruction 
from the interrupt routine, the processor 2 resumes the 
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execution of the original program. With the return from 
the interrupt, a pretetch request is issued by using the 
recovered PFA and data is stored from a PDR element 
designated by the recovered PDRTP. This element is 
positioned next to PDR elements corresponding to the 
number indicated by the BAL field, which have been 
abandoned by the interrupt. 

When the resumption of the prefetch is not suited 
after the interrupt upon hard failure detection of a spe- 
cific circuit related to the prefetch mechanism, for exam- 
ple, the VLD flag is set off in accordance with a PCR 
initialize instruction so as to enable the invalidity of the 
prefetch. Even in this case, the proper operation of the 
program is ensured by the following prefetch buffer re- 
direction. 

5) Prefetch buffer redirection 

When the prefetched data is invalid due to the up- 
dating or the occurrence of the access exception, the 
control for reading the proper data from the cache mem- 
ory 1001 or the memory 1 is executed upon execution 
of the corresponding memory access instruction as al- 
ready described. As already described, this control is 
performed by unissuing the PDRHIT signal for sup- 
pressing the processing of the memory access instruc- 
tion according to the memory request PRREQ. This op- 
eration is called prefetch buffer redirection, which will be 
abbreviated as PBR. 

Even when it is desired to access the prefetched 
data abandoned upon interruption after the interrupted 
program has been resumed, the above PBR is used. 
Namely, a redirection data count RDC field of a PCR is 
set to a BAL at the time of the occurrence of the interrupt 
upon program recovery from the interrupt. If the RDC is 
not zero each time the memory access instruction is ex- 
ecuted, then the prefetch unit continuously performs the 
PBR. The RDC is decremented one by one each time 
memory data is read. A PDROP points a PDR element 
position where the oldest one of the abandoned data 
has been held, upon its reset. Thereafter, the PDROP 
is incremented each time the PBR operation is per- 
formed. 

Figs. 19A and 19B show status examples of PDR 
and PDSR at the context switch. Fig. 19A illustrates the 
status of a PDR and a PDSR before the control is deliv- 
ered to the interrupt routine. In this case, n data indicat- 
ed in the BAL field are stored in the corresponding PDR 
and data invalid flags Dl are set to some of them by up- 
dating. Even if the interrupt occurs, these PDR and PD- 
SR are not saved. Fig. 19B shows the status of a PDR 
and a PDSR immediately after the control has returned 
to the interrupted program. In this case, the PDROP. 
PDRTP and BAL recover the contents at interruption in 
accordance with the program and the redirection data 
count RDC is set to the same value n as the BAL. All 
the contents of RC fields of the PDSR are initially set to 
0. When a memory access instruction associated with 



the corresponaing PDR is executed and a PDR reao re- 
quest is issuea. data is reao from the cache memory 
"001 or the memory 1 unaer the prefetch buffer redirec- 
tion This prefetch buffer reairection ts performed while 

5 the RDC is not 0. Further, the RDC is decremented by 
i at each PDR reading and the PDROP proceeds to the 
next stage. In doing so. the proper program operation 
can be ensured even if the data in the PDR are not saved 
ana recovered uoon interruption. Further, the efficiency 

'0 of interrupt processing can be enhanced by reducing a 
processing time required to save and recover the data. 

(4*. Coherency control 

'5 :n the present prefetch system, hardware assures 
the coherency between the PDR and the cache memory 
1001 in accordance with the prefetch buffer redirection. 
Namely, when tne corresponding data is updated under 
a store instruction until the aata is read out in accord- 
20 ance with a memory access instruction after the execu- 
tion oi a pretetcn start-up instruction, the program en- 
sures that the memory access instruction accepts the 
uoaated data. 

An examole of this control time chart will be shown 
35 -n Fig. 20. The same drawing illustrates the manner in 
whicn load instructions for accessing the corresponding 
data are repeatedly executed under loop control after a 
pretetch has been started up by the execution of an in- 
struction for setting the pretetch. Numerals with () af- 
30 fixed to the load instructions indicate the number of 
times that loops exist. In the case of the present pro- 
gram, a store instruction of a first loop updates an oper- 
and for a load instruction of a second loop. In the present 
example, the pretetch unit starts to issue a prefetch re- 
35 quest from a third cycie (C3) at each cycle after a 
preietch address has been set to the corresponding 
PFA The prefetch address PFA is converted into a 
cache address CADR and read cache data CDATA is 
stored in us corresponding PDRK0). When a load in- 
-o struction m is executed, a PDR read request PDRREQ 
:s generated. Further, the data stored in the PDR1(0) is 
reao out and stored in the general purpose register GR 
oi the processor 2 in C 12. A prefetch request corre- 
sponding to a load instruction (2) is issued in C4 and the 
-^ status of a PDSR1 ( 1 ) is changed to ( 1 1 00) correspond- 
ingly. Since updating is detected upon execution of a 
store instruction O). the slate of the PDSR is changed 
to (1101 V Since the data arrives subsequently, the sta- 
tus thereof is changed to (1011 ). When a load instruction 
50 i2) is executed, the invalidity of the data is recognized 
or identified from the status of the PDSR and hence a 
PDRHIT is suppressed so that the PBR is executed. 
Thus, a memory request address for the load instruction 
[2) is set as the cache address in C12 in place of the 
55 prefetch address. The read data of the load instruction 
(2> is changed to data in a PDR1 ( 1 ). which is antecedent 
to being updated, and is stored in the corresponding reg- 
ister in the processor 2 in C15. 



25 



BNSOOCID: <£P 0723221 A2_l_> 



49 



EP 0 723 221 A2 



50 



Owing to the assurance or the coherency by hard- 
ware, the program logically no requires considering the 
order of the store instruction and the prefetch start-up 
instruction upon ensuring the operation at the time that 
the prefetch is applied. Therefore, this is effective with s 
a view toward achieving code optimization for executing 
the "prefetch start-up instruction as earlier as possible 
and reducing start-up overheads. 

In the present embodiment, the coherency is en- 
sured by hardware. When, however, the updating of 10 
prefetched data frequently occurs, embodiments are al- 
so considered wherein an influence is exerted on the 
performance. In such a case, the coherency can be en- 
sured while avoiding the influence exerted on the per- 
formance by look-ahead or preietch control. This will be *s 
described by the following example. 

<Example> 

DO 10 I = 1. N 
= A (I - VI) 
A(l) = 

10 CONTINUE 

In the present example, symbol M is regarded as 
has been known at compile time. In the example, the 
prefetch is regarded as being effective for an improve- 
ment in performance even if a look-ahead count LAC is 
designated so as to meet LAC < M. Since the prefetch 
does not proceed ahead of the data finally transferred 
to the memory access instruction in excess of the 
number indicated by LAC at this time, data prefetch 
based on the Ad - M) is not generated before the updat- 
ing of A(l) to the corresponding data. Accordingly, the 
updated data is always pref etcned so that the coherency 
is ensured. One example of a time chart for describing 
a data coherency assuring operation under the look- -o 
ahead control is shown in Figs. 21 A and 21 B. 

The present example shows the coherency assur- 
ance at the time of reading of an array element AO - 4) 
in a DO loop shown in the same drawing. Since the array 
element A(l - 4) in the loop is always updated upon ex- -5 
ecution of a loop before four times, it is necessary to 
read the result of its updating. Therefore, the look-ahead 
count LAC is designated as 3 and the look ahead based 
on the prefetch is limited to three elements. Since the 
value of the BAL reaches the LAC when data reading so 
for a load instruction (3) is completed in a fourth cycle 
(C4). the issuance of a request is put on held. When a 
toad instruction (1 ) (abbreviated as ld.1) is executed in 
C6 and PDR data is read out. the BAL is brought to 2. 
so that a prefetch request can be issued. It is thus pos- 55 
sible to ensure that the next prefetch of Id. 5 is executed 
after execution of an instruction of Id. 2. On the other 
hand, the writing st.5 of data into an array element A(5) 



to be executed before prefetching of Id. 5 is executed 
Defore la. 2 as *s apparent from the program. According- 
ly, the instruction Id. 5 is executed after completion of st. 
5 and thereoy updated data can be obtained. 

<Program examoles> 

Several programs, which access or refer to the data 
structures corresponding to objects, will now be de- 
scribed in accordance with an optimizing system for pro- 
viding the speeding up. an object program to which the 
system is applied, and instruction execution traces em- 
ployed in the present information processing apparatus. 

Figs. 22A and 22B show a program which accesses 
or refers to a sequential vector B. This case assumes 
that a loop length 3N is long and a vector BO) exceeds 
the capacity of the cache in the processor 2. The opti- 
mization for accessing an externally-provided cache 
thereinafter cailed simply cache memory) of the proc- 
essor 2 is performed. Assuming that about 7 cycles are 
required to access the cache memory 1001 in the 
present embodiment, the following ioop unrolling and in- 
struction scnedulmg are applied. A source program is 
first brougnt toiriple loop unrolling as shown in Fig. 22B. 
^5 Instructions ootamed by unrolling this loop portion to the 
object program and subjecting it to the instruction 
scheduling will be shown in Fig. 23. In the drawing, the 
instructions are described three by three per row. This 
is intended to easily see the correspondence with the 
30 source program after the loop unrolling as shown in Fig. 
22B. Instruction positions on the memory 1 are aligned 
m the form of a first row and a first column (hereinafter 
described as (1.1)). (1 .2l( 1 .3){2. 1 ) Namely, if the in- 
structions up to three rows are described in accordance 
35 with a normal writing way. then they are expressed as 
follows: 

• i1.1)(MB2]F!oad(GRl2) ->FR2: BO) 
{1.2)[MR3]And MR3. MR10 -> MR3 
!2.1)[MR2]Add GR12kJ1 ->GR12 
i2.2)[MR1]Set 1 to MR3 
(2.3)[MR4]Fstore FR4 -> (GR13): A(l+2) 

(3.2) [MR1]And MR3. MR10 -> MR3 

(3.3) [MR4]Add GR1 3+d2 -> GR1 3 

Mask register numbers (designated by rm) for con- 
trolling the presence or absence of execution of the in- 
structions are assigned to the heads of the instructions. 
Figs. 24 and 25 show alignment (traces) of executed in- 
structions at the time that the objects have been execut- 
ed. In the original program, however, a loop length is set 
to 6 (N = 2) for simplicity. Upon a first execution of an 
object program in a loop, the initial one instruction on a 
second column and the initial three instructions on a 
third column are of instructions whose executions are 
unnecessary. They are executed under condition exe- 
cution control because the initial values of mask regis- 
ters are set as shown in the same drawing. It is neces- 
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sary to note that MR1 is always 1 (expressed as true in 
a logical value). A first program execution stage includ- 
ing this execution suppression is called prologue stage. 
In the case of a second execution, all the instructions in 
the loop are executed. This program execution stage is 
called body stage. In this example, the body stage is 
executed once. However, the body stage is normally re- 
peatedly executed according to the loop length. In the 
case of a third execution, the instructions of twenty-first 
through twenty-sixth rows and a twenty-eighth row of the to 
first column, the instructions of twenty-fourth through 
twenty-ninth rows of the second column and the instruc- 
tions of twenty-seventh and twenty-eighth rows of the 
third coiumn are suppressed from executton. This final 
loop execution stage is called epilogue stage. Eventu- 
ally, an actual ATA is stored in a PDR1(0) in practice. 
When the load instruction (1) is executed, a PDR read 
request PDRREQ is generated and the data stored in 
the PDRKO) is read out and stored in the general pur- 
pose register GR of the processor 2 in C12. A prefetch 
request associated with the load instruction (2) is issued 
in C4 and correspondingly the state of a PDSRKU is 
changed to i n 00). Since updating is detected upon ex- 
ecution of the store instruction 1 1 ). the status of the PD- 
SR is changed to (1101). Since data arrives subse- 
quently, the state of the PDSR is changed to (1011). 
When the load instruction (2) is executed, it is recog- 
nized that the data is invalid from the status of the PDSR. 
so that the PDRHIT is suppressed to execute the PBR. 
Thus, a memory request address for the load instruction 
(2) is set as the cache address in C12 in place of the 
prefetch address. The read data of the load instruction 
(2) is changed to data in a PDR1 ( 1 ). which is antecedent 
to being updated, and is stored in the corresponding reg- 
ister in the processor 2 in C15. 

Owing to the assurance of the coherency by hard- 
ware, the program logically no requires considering the 
order of the store instruction and the prefetch start-up 
instruction upon ensuring the operation at the time that 
the prefetch is applied. Therefore, this is effective at 
achieving code optimization for executing the prefetch 
start-up instruction as earlier as possible and reducing 
start-up overheads. 

In the present embodiment, the coherency is en- 
sured by hardware. When, however, the updating of 
prefetched data frequently occurs, embodiments are al- 
so considered wherein an influence is exerted on the 
performance. In such a case, the coherency can be en- 
sured while avoiding the influence exerted on the per- 
formance by look-ahead or prefetch control. This will be 
described by the following example. 

<Example> 

DO 10 I = 1. N 
= A (I - M) 



10 CONTINUE 

;n the present example, symbol M is regarded as 
has oeen known at compile time. In this example, the 
prefetch is regarded as being effective for an improve- 
ment m performance even if a look-ahead count LAC is 
designated so as to meet LAC < M. Since the prefetch 
does not proceed ahead of the data finally transferred 
to the memory access instruction in excess of the 
numoer indicated by LAC at this time, data prefetch 
basea on the Ad - M) is not generated before the updat- 
ing oi A(h to the corresponding data. Accordingly, the 
J= updated data is always prefetched so that the coherency 
is ensured. One example of the time chart for describing 
the oata coherency assuring operation under the look- 
aheao control is shown in Figs. 21 A and 21 B. 

The present example shows the coherency assur- 
so ance at the time of reading of an array element Ad - 4) 
in a CO loop sr.own in the same drawing. Since the array 
element Ad - ^ in the loop is always updated upon ex- 
ecution of a iocd before four times, it is necessary to 
reaa out the result of its updating. Therefore, the look- 
-5 aheao count LAC is designated as 3 and the look ahead 
basea on the Drefetch is limited to three elements. Since 
the value of the BAL reaches the LAC when data reading 
for a ioad instruction (3) is completed in a fourth cycle 
• C4V the issuance of a request is put on held. When a 
so load instruction ( 1 ) (abbreviated as Id. 1 ) is executed in 
Co and PDR data is read out. the BAL is brought to 2. 
so that a prefetch request can be issued. It is thus pos- 
sible to ensure that the next prefetch of ld.5 is executed 
after execution of an instruction of Id. 2. On the other 
S5 nana, the writing st.5 of data into an array element A(5) 
to be executed before prefetching of ld.5 is executed 
before id. 2 as is apparent from the program. According- 
ly the instruction ld.5 is executed after completion of st. 
5 ana thereoy updated data can be obtained. 

<Program examples> 

Several Drograms. which access or refer to the data 
structures corresponding to objects, will now be de- 
-5 scribed in accordance with an optimizing system for pro- 
viding the soeeding up. an object program to which the 
system is applied, and instruction execution traces em- 
ployed in the present information processing apparatus. 
Fig. 22A shows a program which accesses or refers 
so to a sequential vector B. This case assumes that a loop 
length 3N is long and a vector Bd) exceeds the capacity 
of the cache in the processor 2. The optimization for ac- 
cessing an externally-provided cache (hereinafter 
called simply cache memory) of the processor 2 is per-. 
55 formed. Assuming that about 7 cycles are required to 
access the cache memory 1 001 in the present embod- 
iment, the following loop unrolling and instruction sched- 
uling are appited. A source program is first brought to 
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triple loop unrolling as shown in Fig. 22B. 

Instructions obtained by unrolling this loop portion 
to the object program and subjecting it to the instruction 
scheduling will be shown in Fig. 23. In the drawing, the 
instructions are described three by three per row. This 
is intended to easily see the correspondence with the 
source program after the loop unrolling as shown in Fig. 
22B. The instruction positions on the memory 1 are 
aligned in the form of a first row and a first column (here- 
inafter described as (1.1)). (1.2).(1.3)(2.1) Namely, 
if the instructions up to three rows are described in ac- 
cordance with a normal writing way. then they are ex- 
pressed as follows: 

(1.1) [MR2]Fload(GRl2) ->FR2: B(l) 

(1.2) [MR3]And MR3. MR10 -> MR3 

(2.1) [MR2]Add. GRl2+d1 -> GR12 

(2.2) [MR1]Set 1 to MR3 

(2.3) [MR4]Fstore FR4 -> (GR13): A<l+2) 

(3.2) [MR1]And MR3. MR10 -> MR3 

(3.3) [MR4]Add GR13+d2 -> GR13 

Mask register numbers (designated by rm) for con- 
trolling the presence or absence of execution of the in- 
structions are assigned to the heads of the instructions. 
Figs. 24 and 25 show alignment (traces) of executed in- 
structions at the time that the objects have been execut- 
ed. In the original program, however, a loop length is set 
to 6 (N - 2) for simplicity. Upon a first execution of an 
object program in a loop, the initial one instruction on a 
second column and the initial three instructions on a 
third column are of instructions whose executions are 
unnecessary- They are executed under condition exe- 
cution control because the initial values of mask regis- 
ters are set as shown in the same drawing. It is neces- 
sary to note that MR1 is always 1 (expressed as true in 
a logical value). A first program execution stage includ- 
ing this execution suppression is called prologue stage. 
In the case of a second execution, all the instructions in 
the loop are executed. This program execution stage is 
called body stage. In this example, the body stage is 
executed once. However, the body stage is normally re- 
peatedly executed according to the loop length. In the 
case of a third execution, the instructions of twenty-first 
through twenty-sixth rows and a twenty-eighth row of the 
first column, the instructions of twenty-fourth through 
twenty-ninth rows of the second column and the instruc- 
tions of twenty-seventh and twenty-eighth rows of the 
third column are suppressed from execution. This final 
loop execution stage is called epilogue stage. Eventu- 
ally, the instructions to be executed would be only those 
surrounded by frames. 

When this code is executed, a floating point number 
memory access instruction Fload at a position (1.1) 
causes a floating point register FR2 to take a vector el- 
ement B(l) from a seventh cycle. Addresses have been 
stored in a general purpose register GR12. It is neces- 
sary to process nine instructions until a floating point 



numDer store ins-ruction Fstore at a position (5.1). which 
accesses an FR2. is executed. The present processor 
2 neeas nine cycies. Thus, since the necessary data has 
been already caDtured in the FR2. the Fstore instruction 
5 is executed without queuing. A Fload instruction at a po- 
sition (4.2) reads out the next vector element B(l + 1). 
Since, however the FR2 is being used, the vector ele- 
ment B(l + 1 ) is taken in an FR3. However a register for 
storing an address shares the use of the GR12. 
10 A data reference latency longer than a pipeline pitch 
can be hidden in other instruction process by optimally 
scheauling the object code after the loop unrolling in this 
way. in the present embodiment, if all the data are held 
m the cache memory 1001 . then the object program can 
is be executed without queuing for data reference as de- 
scribed above If. however, the data are unable to fully 
enter into the cache memory 1001 . then the queuing 
takes place. In order to avoid this, there is considered a 
method of making a further enhancement in multiplicity 
20 of the loop unroilmg. Since, however, the number of nec- 
essary registers increases, limitations are imposed to 
this method. The present embodiment is effective at 
solving this problem. 

Fig. 26A shows an example of a program that refers 
25 to an indexed vector B. The indexed vector is L. In the 
same manner as the program example 1. the program 
is also brought to triple loop unrolling as shown in Fig. 
25B to thereby generate objects as shown in Fig. 27. 
For simplicity, the length of a loop is defined as 3 (N = 
i \ and traces resultant from its instruction execution are 
illustrated in Figs. 28 and 29. Instructions executed un- 
der condition execution control are surrounded and in- 
dicated by frames. 

Fig. 30A illustrates an example of a program that 
35 refers to an indexed vector. The result is saved in a vec- 
tor A only in the case of the stipulation that the value of 
an index is positive. In regard to this as well, a translated 
source program is shown in Fig. 30B. an object program 
is illustrated in Fig. 31 and instruction execution traces 
*o are aepicted in Figs. 32 and 33 

As has been described above, the information 
processing apparatus according to the present embod- 
iment can bring about the following features. 

1 ! Data prefetch asynchronous with processor 

A prefetch ooeration with respect to each individual 
data is performed asynchronously with the processor 2 
without executing an explicit prefetch instruction. It is 
so thus possible to avoid a reduction in performance, which 
is caused by an instruction execution neck developed 
in synchronous prefetching. 

2) Prefetch in data units 

55 

In a data unit prefetch mechanism, data are read 
one by one in association with a memory access instruc- 
tion that needs the data and held in the corresponding 
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prefetch data register (PDRV Even in the case of data 
on the same address, the data are read out again ac- 
cording to the memory access instruction and stored in 
different places on the preietch data register (PDRV The 
above prefetch in the data units is fit to refer to a large- 
scale sequential vector, a non-contiguous sequential 
vector large in stride, etc. and prefetch data low in re- 
use probability. 

3) Prefetch in address units 

Since there is provided a function for storing 
prefetched data in the cache memory 1001 . an advan- 
tageous effect can be brought about in that a load im- 
posed on the memory 1 is reduced by prefetching the 
data from this cache, where the prefetch made to the 
same address frequently occurs. This is effective at re- 
ferring to a target vector in indexed vectors and low data 
of a linked list, for example. 

4) Prefetch address 

Since an address generated for a prefetch request 
should be identical to an operand address for a memory 
access instruction, it is a virtual address. The prefetch 
request is generated asynchronously with the processor 
2 and is held in a dedicated prefetch address register. 

5) Prefetched-data buffer 

Dedicated prefetch data registers (PDR) for holding 
read data therein are provided to perform data prefetch- 
ing in data units. Since the PDR are provided, the re- 
quired value of throughput accessed to the cache can 
be suppressed to the same degree as when no prefetch 
is used. Thus, the provision of the PDR is effective at 
reducing an increase in cost. 

6) Initialize and request control 

Upon prefetch execution, the initialization and star- 
tup of control information and the stoppage of its supply 
are performed in accordance with dedicated instruc- 
tions. 

7) Control on transfer of data to processor 2 

The transfer of the data from a prefetch buffer to the 
processor 2 is requested by executing the correspond- 
ing memory access instruction. However, the memory 
access instruction is identified with its base address reg- 
ister number as a key. Thus, an instruction dedicated to 
the transfer is unnecessary. Further, an extension such 
as provision of an identification field for the conventional 
instruction or the like is also unnecessary. 



c j Look-aheaa control 

When the execution time of the corresponding 
memory access instruction is defined as reference, the 

5 time at which the corresponding data is read, can be 
controlled. Nameiy. control for allowing only data nec- 
essary for instruction execution that stays n limes ahead 
of the execution of the memory access instruction to be 
prefetched can oe carried out. It is therefore possible to 

w avoid contamination of the cache with the unnecessary 
data. Further, the look-ahead control can be also used 
for the assurance of data coherency. 

9) Coherency control on PDR 

15 

The coherency between the PDR and cache is en- 
sured by hardware using a store address check and a 
mechanism called prefetch buffer redirection PBR. 
Namely, when a store instruction updates a prefetching 
20 region after the startup of prefetch, the hardware en- 
sures that a memory access instruction refers to updat- 
ed data. 

■0i Data skip function associated with IF statement 

£5 

Even if an IF statement exists in a loop, the present 
prefetch can be applied by skipping prefetched data un- 
der empiy transfer even when a conditional memory ac- 
cess mstructipn is suppressed from execution. 

30 

11 ) Address skip function 

Upon generating prefetch addresses relative to a 
sequential vector, the addresses can be updated on a 

35 non-linear basis at predetermined intervals. Thus, when 
the innermost (cop is repeatedly executed to partially ac- 
cess a multidimensional array, the entire necessary data 
can be prefetched by one prefetch startup. This is effec- 
:sve at reducing startup overheads at the prefetch and 

~o providing the speeding up. 

121 Context switch that eliminates the need for saving 
and recovery of data 

j s a context switch incident to an interrupt makes it 
unnecessary to perform saving and recovery of a PDR. 
Data abandoned upon interruption are read again by a 
PBR mechanism after their return. This is effective at 
speeding up the context switch. 

so According to various features of the present inven- 
tion, as has been described above, an information 
processing apparatus having a prefetch circuit with a 
higher function can be provided. 

According to one feature of the present invention. 

55 for example, data held in a cache memory can be used 
at prefetching in the information processing apparatus 
in which the cache memory exists in a memory such as 
a mam memory or the like. 
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According to another feature of the present inven- 
tion, a plurality of groups of data can be prefetched. 

According to a further feature of the present inven- 
tion, data having structures complex as compared with 
a simple vector such as an indexed vector or the like 
can be also prefetched. 

According to a still further feature of the present in- 
vention, a prefetch circuit can be obtained which is ap- 
plicable even when an IF statement exists in a loop, by 
skipping prefetched data under empty transfer even 
when a conditional memory access instruction is sup- 
pressed from execution. 

According to a still further feature of the present in- 
vention, when prefetch addresses relative to a sequen- 
tial vector are generated, the addresses can be non-lin- 
early updated at predetermined intervals. Thus, when 
the innermost loop is repeatedly executed to partially ac- 
cess a multidimensional array, the entire necessary data 
can be prefetched by one prefetch startup. This is effec- 
tive at reducing startup overheads at the prefetch and 
providing the speeding up. 

According to a still further feature of the present in- 
vention, even if an interrupt is generated within the in- 
formation processing apparatus, the use or non-use of 
prefetched data can be controlled 

Claims 

1 . An. information processing apparatus, comprising: 

(a) a storage device for holding a program and 
data. 

(b) a processor connected to said storage de- 
vice for executing instructions included in the 
program: 

(c) a cache memory connected to said storage 
device for holding a copy of blocks held therein: 

(d) a cache control circuit connected to said 
cache memory and said processor for control- 
ting accesses to said cache memory: 

(e) a prefetch circuit connected to said proces- 
sor and said storage device for prefetching a 
group of data designated by said program from 
said storage device, before said program uses 
said group of data: 

wherein said prefetch circuit includes. 

(eD a group of storage regions: 
(e2) a prefetch data request circuit responsive 
to a data prefetch request issued by said proc- 
essor for sequentially issuing to said cache 
control circuit, a group of prefetch data read re- 
quests which request readout of a group of data 
designated by said data prefetch request, and 
responsive to subsequent supply of said group 
of data to said prefetch circuit for sequentially 



writing said supplied group of data into said 
group of storage regions, according to a prede- 
termined order of said storage regions: and 
te3) a prefetch data supply circuit responsive 

5 to a data read request issu ed by said processor 

after saia data prefetch request for detecting 
whether aata designated by said data read re- 
quest is neld in one of said group of storage 
regions, and for transferring said designated 

io data from one of said group of storage regions 

to said processor, in case said designated data 
is held in said one storage region: 

wherein said cache control circuit includes: 

(dl ) a prefetch data transfer circuit responsive 
to each of said group of prefetch data read re- 
quests issued by said each prefetch data read 
request circuit for transferring data designated 

20 by said each prefetch data read request to said 

prefetch circuit, in case said designated data is 
held in said cache memory: and 
• (d2) a prefetch data read request circuit re- 
sponsive to said each prefetch data read re- 

25 quest for requesting said storage device to read 

said designated data, in case said designated 
data is not held in said cache memory. 

2. An information processing apparatus according to 
30 claim 1 . 

wherein said prefetch circuit further includes: 

ie4) a aata transfer prohibit circuit which pro- 
hibits said cache control circuit from transfer- 
as ring said data designated by said data read re- 
quest issued by said processor, in case said 
designated data is held in one of said group of 
storage regions within said prefetch circuit: 

-o wherein said cache control circuit further in- 

cludes: 

(d3) a read data transfer circuit connected to 
said processor and responsive to said data 

45 read request issued thereby for transferring 

said data designated by said data read request 
from said cache memory to said processor, un- 
der a condition that transfer of said designated 
data is not prohibited by said transfer prohibit 

so circuit, in case said designated data is held in 

- said cache memory: and 
td4) a read data request circuit responsive to 
said data read request for supplying said stor- 
age device with a first block transfer request 

55 which requests transfer a first block which in- 

cludes said designated data, in case said des- 
ignated data is not held in said cache memory, 
under a condition that transfer of said designat- 
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ed data is not pronibited by said transier pro- 
hibit circuit, said read data request circuit fur- 
ther transferring said designated data mciuded 
in said first block to said processor, in case said 
first block is transferred to said cache memory ^ 
from said storage device. 

An information processing apparatus according to 
claim 2. further comprising a storage device control 
circuit for controlling accesses to said storage de- -o 
vice: 

wherein said data request circuit provided in 
said cache control circuit includes a first circuit 
for supplying said storage device control circuit ^ 
with a second block transfer request which re- 
quests said storage device to read out a second 
block which includes data designated by one of 
said group of prefetch data read requests, in 
case said data designated by said one prefetch so 
data read request is not held in said cache 
memory: 

wherein said storage device control circuit in- 
cludes a storage device access circuit respon- 
sive to said second block transfer request for ss 
reading out and transferring said second block 
from said storage device to said cache memory 
and for transferring said data designated by 
said one prefetch data read request and includ- 
ed in said read out second block to said so 
prefetch circuit. 

An information processing apparatus according to 
claim 3. 

J5 

wherein said prefetch data read request circuit 
further includes: 

a second circuit responsive to said one of said 
group of prefetch data read request for supply- 
ing said storage device control circuit witha 
prefetch data transfer request which requests 
readout of said data designated by said one 
prefetch data read request from said storage 
device and transfer of said designated data to 
said prefetch circuit: and J5 
a third circuit connected to said first and second 
circuit and responsive to said data prefetch re- 
quest issued by said processor for controlling 
said first and second circuits so that either said 
first circuit supplies said second block transfer so 
request or said second circuit supplies said 
prefetch data transfer request, depending upon 
whether said data prefetch request issued by 
said processor includes predetermined infor- 
mation: -° 
wherein said storage device access circuit in- 
cluded in said storage device control circuit in- 
cludes: a fourth circuit connected to said third 



circuit and responsive to said second block 
transfer request for reaaing out and transferring 
said second block from said storage device to 
said cache memory and transferring said data 
designated by said one prefetch data read re- 
quest ana included in said read out data to said 
prefetch circuit, and responsive to said prefetch 
data transfer request for reading and transfer- 
ring said data designated by said one prefetch 
data reaa request from said storage device to 
said prefetch circuit. 

i. An information processing apparatus according to 
claim 3. 

wherein said cache control circuit includes a 
circuit responsive to a write request issued by 
said processor which designates data to be 
written and an address of a location of said stor- 
age device for writing into said cache memory, 
said data designated by said write request: 
wherein said prefetch circuit includes: 
a circuit responsive to said write request for de- 
tecting wnether data having said address des- 
ignated by said write request is held in one of 
said group of storage regions: and 
a circuit connected to said detecting circuit and 
responsive to detection that data having said 
designated address is held in one of said group 
of storage regions for invalidating said data 
held in said one region: 
wherein said data transfer prohibit circuit in- 
cludes a circuit responsive to said data read re- 
quest issued by said processor for not prohib- 
iting said cache control circuit from transferring 
said data designated by said data read request 
issued by said processor, in case said desig- 
nated data is held in one of said group of stor- 
age regions but has been invalidated by said 
invalidating circuit. 

6. An information processing apparatus, comprising: 

(a) a storage device for holding a program and 
data: 

(b) a processor connected to said storage de- 
vice for executing instructions included in the 
program: 

(c) a prefetch circuit connected to said proces- 
sor and said storage device for prefetching a 
plurality of groups of data designated by said 
program from said storage device, before said 
program uses said plurality of groups of data: 

wherein said processor includes: 

(bl ) a plurality of registers which can be desig- 
nated by instructions included in said program: 
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(b2) a circuit for issuing a plurality of data 
prefetch requests to said prefetch circuit, eacn 
data prefetch request designating at least data 
structure of a group of data to be fetched by 
said each data prefetch request and a base reg- $ 
ister number of a base register for holding a 
base address of used in common to said group 
of data: 

(b3) a circuit responsive to a data read instruc- 
tion which requests readout of data from said 10 
storage device for calculating an address of 
said data, based upon contents held in one of 
said registers having a base register number 
designated by said instruction and other ad- 
dress information designated by said instruc- ^ 
tion: 

(b4) a circuit responsive to said data reads in- 
struction for issuing to said prefetch circuit a da- 
ta read request which includes said calculated 
address of said data and said base register 20 
number designated by said data read instruc- 
tion: 

said program being programmed so that a 
group of data read instructions which is included in z$ 
said program and each of which requests readout 
of one of said group of data prefetched by one of 
said plurality of data prefetch requests designate a 
same base register number as one designated by 
said one data prefetch request: 30 

wherein said prefetch circuit includes: 

(c1) a plurality of groups of storage regions: 
fc2) a circuit connected to said processor and 
said plurality of groups of storage regions and 35 
responsive to each group of said group of data 
prefetch requests issued by said processor for 
assigning one group of storage regions within 
said groups of storage regions to said each da- 
ta prefetch request: ~° 
{c3) a circuit connected to said circuit for as- 
signing, for holding, in correspondence to each 
group of storage regions within said groups of 
storage regions, a base register number desig- 
nated by one of said group of data prefetch re- 
quests which has been assigned to said each 
group of storage regions: 
(c4) a prefetch data read circuit responsive to 
each of said groups of data prefetch requests 
issued by said processor for reading from said so 
storage device, a group of data having data 
structure designated by said each data prefetch 
request, and for sequentially writing said group 
of data into a group of storage regions within 
said groups of storage regions as have been ss 
assigned to said each data prefetch request ac- 
cording to a predetermined order of storage re- 
gions: 



c5) a circuit connected to said processor and 
responsive to a data read request issued there- 
by for detecting, base upon a base register 
number heid in correspondence to each group 
of storage regions, whether one group of stor- 
age regions among said groups of storage re- 
gions have been assigned to a data prefetch 
request which has designated a same base 
register number as one designated by said data 
read request, said detecting being executed 
based upcn said base register number held for 
each group of storage regions: and 
ic6) a prefetch data supply circuit connected to 
said processor and said circuit for detecting, for 
supplying said processor with one of group of 
data held in one group of storage regions, in 
case said one group of storage regions has 
been assigned to said base register number as 
designated by said data read request, wherein 
said prefetch data supply circuit includes a cir- 
cuit for sequentially reading a group of data 
•'rom one group of storage regions within said 
groups cf storage regions according to said 
□ redetermined order of storage regions, in re- 
sponse to a group of data read requests issued 
by said processor, wherein said group of data 
read requests designate a same base register 
number as one which has been designated by 
one of said group of data prefetch requests 
which has been assigned to said one group of 
regions. 

7. An information orocessing apparatus according to 
c:a;m 6. wherein said circuit included in said proc- 
essor for issuing said plurality of data prefetch re- 
auests includes a circuit for issuing a data prefetch 
rsauest which designates data structure of a simple 
vector whose elements have a predetermined ad- 
dress increment, as data structure of a group of data 
to oe prefetched. 

8. An information processing apparatus according to 
ciaim 6. wherein said circuit included in said proc- 
essor for issuing said plurality of data prefetch re- 
ouests includes a circuit for issuing a data prefetch 
reauest which designates data structure of a two- 
dimensional array, as data structure of a group of 
data to be prefetched, wherein said two-dimension- 
ai array comprises a plurality of simple vectors 
wmch are mutually separated by a predetermined 
aadress increment, wherein each simple vector has 
elements which have a predetermined address in- 
crement. 

9. An information processing apparatus according to 
ciaim 6. wherein said circuit included in said proc- 
essor for issuing said plurality of data prefetch re- 
quests includes a circuit for issuing a data prefetch 



32 



JSDOCID: <EP 0723221 A2_L> 



53 



EP 0 723 221 A2 



request which designates data structure of an in- 
dexed vector, as data structure of a group of data 
to be prefetched, wherein said indexed vector com- 
prises a first simple vector and a second simple vec- 
tor whose elements includes indexes to elements s 
of said first vector to be prefetched, wherein said 
* elements of each of said first and second simple 
vectors have a predetermined address increment. 

10. An information processing apparatus according to io 
claim 6. wherein said circuit included in said proc- 
essor for issuing said plurality of data prefetch re- 
quests includes a circuit for issuing a data prefetch 
request which designates data structure of a link 
list, as data structure of a group of data to be ^ 
prefetched, wherein said link list comprises a plu- 
rality of stages of tables, elements of each table in- 
cluding positional information of an element of one 

of said plurality of tables succeeding to said each 
table. 20 

11. An information processing apparatus, comprising: 

(a) a storage device for holding a program and 
data: - 5 

(b) a processor connected to said storage de- 
vice for executing instructions included in the 
program: and 

(c) a prefetch circuit connected to said proces- 
sor and said storage device for prefetching a JO 
plurality of groups of data designated by said 
program from said storage device, before said 
program uses said plurality of groups of data: 

wherein said processor includes a circuit for 
issuing a plurality of data prefetch requests to said 
prefetch circuit.. each data prefetch request desig- 
nating a group of data to be fetched, said plurality 
of data prefetch requests including at least one data 
prefetch request which includes, as data structure -o 
designation information which designates a group 
of data to be fetched by said one data prefetch re- 
quest, data structure information designating at 
least one other group of data which are used to cal- 
culate addresses of said group of data to be 
fetched: 

wherein said prefetch circuit includes: 

(d ) a plurality of groups of storage regions: 
(c2) a first read circuit connected' to said proc- 
essor and said plurality of groups of storage re- 
gions and responsive to said one data prefetch 
request issued by said process for sequen- 
tially reading said other group of data from said 
storage device and for sequentially writing said « 
other group of data into a first group of storage, 
regions within said groups of storage regions 
according to a predetermined order of storage 



regions: 

!c3) a circuit connected to said groups of stor- 
age regions and responsive to said one data 
prefetch request for generating a group of ad- 
dresses for saia group of data to be fetched, 
based upon said other group of data held in said 
first group of storage regions: 
(c4) a second read circuit connected to said cir- 
cuit for generating said group of address, for 
sequentially reading out said group of data from 
said storage device, based upon said generat- 
ed group of addresses, and for sequentially 
writing said read out group of data into a second 
group of storage regions within said plurality of 
groups of storage regions according to a pre- 
determined order of storage regions: 
(c5) a circuit connected to said processor and 
responsive to a data read request issued there- 
by for detecting whether data requested by said 
data read request is held in said second group 
of storage regions, based upon address infor- 
mation designated by said data read request: 
and 

f c6 1 a prefetch data supply circuit connected to 
said processor and said circuit for detecting, for 
supplying said processor with said data re- 
quested by said data read request from said 
one region, in case said second group of stor- 
age regions holds said data requested by said 
data read request. 

12. An information processing apparatus, comprising: 

\a) a storage device for holding a program and 
data: 

tb) a processor connected to said storage de- 
vice for executing instructions included in the 
program: and 

\c) a prefetch circuit connected to said proces- 
sor and said storage device for prefetching a 
group of data designated by said program from 
said storage device, before said program uses 
said group of data: 

wherein said processor includes a circuit for 
issuing a data prefetch request requesting to 
prefetch a group of data which comprised of a plu- 
rality of sequentially-ordered partial groups of data, 
each panial group including elements addresses 
thereof being spaced from an adjacent element by 
a first predetermined address increment, an ad- 
dress of a last element of each partial group being 
spaced by a second predetermined address incre- 
ment from an address of a start element of a partial 
group of data succeeding to said each partial group: 

wherein said prefetch circuit includes: 

(ci) a plurality of storage regions: 
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ic2i an address generating circuit connected to 
said processor and said oturality of storage re- 
gions and responsive to said data prefetch re- 
quest issued by said processor for sequentially 
generating addresses of said group of data. 5 
said circuit including: 

a first circuit for seauentially generating ad- 
dresses of data belonging to each partial 
group, based upon said first address incre- io 
ment. and 

a second circuit responsive to generation 
of addresses of data of each partial group 
by said first circuit for generating an ad- 
dress of starting data of a partial group sue- '5 
ceeding to said each partial group, based 
upon said second address increment. 

(c3) a data read circuit connected to said ad- 
dress generating circuit and said plurality of 20 
storage regions for sequentially reading out 
said group of data from said storage device, 
based upon said addresses generated by said 
address generating circuit and for sequentially 
writing said read out group of data into said 25 
group of storage regions: 
ic4) a circuit connected to said processor and 
responsive to a data read request issued there- 
by for said storage device, for detecting wheth- 
er data requested by said data read request is so 
held in said group of storage regions, based up- 
on address information designated by said data 
read request: and 

1 c5) a prefetch data supply circuit connected to 
said processor and said circuit for detecting, for 35 
supplying said processor with said data re- 
quested by said data read request from said 
one region, in case said second group of stor- 
age regions holds saia data requested by said 
data read request. "° 

13. An information processing apparatus, comprising: 

(a) a storage device for holding a program and 
data: 

(b) a processor connected to said storage de- 
vice for executing instructions included in the 
program: and 

(c) a prefetch circuit connected to said proces- 
sor and said storage device for prefetching a *o 
group of data designated by said program from 
said storage device, before said program uses 
said group of data: 

wherein said processor includes: 5d 

(bl) a plurality of mask registers, each mask 
register holding mask information which con- 



trols execution of an instruction wnich desig- 
nates said each mask register: 
ib2} a circuit for issuing a aata prefetch request 
to said prefetch circuit, said data prefetch re- 
quest designating a group of data to be fetched 
and one of said plurality of mask registers: and 
ib3) a circuit responsive to each of a plurality 
of data read instructions each of which requests 
readout of data from said storage device for is- 
suing to said prefetch circuit, eacn data read 
request including address information desig- 
nated by said each data read instruction and 
contents of one of said plurality of mask regis- 
ters designated by said each data read instruc- 
tion: 

wherein said prefetch circuit includes: 

id) a group of storage regions: 
\c2) a prefetch data read circuit connected to 
said processor, said storage device and said 
group ot storage regions and responsive to said 
data preietch request issued by said processor 
for sequentially reading said group of data des- 
ignated by said data prefetch request from said 
storage aevice and for sequentially writing said 
group or data into said group of storage regions 
according to a predetermined order of storage 
regions: 

ic3) a circuit connected to said processor and 
responsive to each of said plurality of data read 
requests issued thereby for detecting whether 
said data requested by said each data read re- 
quest is held in said group of storage regions, 
based upon said address information designat- 
ed by said each data read request: and 
ic4) a prefetch data supply circuit connected to 
said processor, said circuit for detecting and 
said group of storage region, for supplying said 
processor with data requested by each of said 
plurality of data read requests from said group 
of storage regions, in case said group of stor- 
age regions hold said data requested by said 
each data read request: 

wherein said prefetch data supply circuit in- 
cludes: 

a circuit connected to said group of storage re- 
gions and responsive to a group of data read 
requests within said plurality of data read re- 
quests issued by said processor for sequential- 
ly reading a group of data from said group of 
storage regions according to said predeter- 
. mined order of said storage regions, wherein 
said group of data read requests are ones data 
requested by each of which is held in said group 
of storage regions: and 
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a circuit connected to said circuit tor sequen- 
tially reading and responsive to contents of one 
of said plurality of mask register designated by 
each of said group of data read requests for 
controlling readout of data requested by said 
each data read request so that said requested 
aata is read out or not read out. depending upon 
whether said contents of said designated mask 
has a predetermined value or not. 

14. An information processing apparatus according to 
claim 13. 

wherein said data prefetch request issued by 
said processor includes mask use information 
which indicates use of contents of one of said 
plurality of mask registers designated by a data 
read request issued by said processor: 
wherein said prefetch data supply circuit in- 
cludes a circuit connected to said circuit for 
controlling, for prohibiting said circuit for con- 
trolling from responding to contents of one of 
said plurality of mask registers designated by 
each of said group of data read requests, in 
case said mask use information designated by 
said data prefetch request does not indicate 
use of contents of a mask register 

15. An information processing apparatus, comprising: 

(a) a storage device for holding a program and 
data: 

(b) a processor connected to said storage de- 
vice for executing instructions included in the 
program: and 

-:c) a prefetch circuit interposed between said 
processor and said storage device: 

wherein said processor includes a circuit for 
issuing a data prefetch request to said prefetch cir- 
cuit, said data prefetch request designating a group 
of data to be fetched: 

wherein said prefetch circuit includes: 



processor with said cata designated by said in- 
struction from said group of storage regions, in 
case said group of storage regions hold said 
data reauested: 

5 tc4) a circuit connected to said processor, satd 

prefetch data read circuit and said prefetch da- 
ta supply circuit and responsive to an interrup- 
tion generated in said processor for saving first 
information specifying said group of data, sec- 

io ond information specifying data not yet read 

from said storage device by said prefetch data 
read circuit among said group of data at occur- 
rence oi said interruption, and third information 
specifying data already supplied to said proc- 

jf essor by said prefetch data supply circuit at oc- 

currence of said interruption among data al- 
ready read out by said prefetch data read cir- 
cuit: 

tc5) a circuit connected to said processor and 
20 responsive to completion of interruption 

processing executed by said processor for re- 
covering said savea first to third information: 
and 

ic6) a circuit connected to said circuit for recov- 
25 enng ana responsive to said recovered first to 

third data for sequentially prefetching part of 
said group of data from said storage device, 
said part of data including data already read 
from said storage device by said prefetch data 
jo read circuit but not yet supplied to said proces- 

sor by said prefetch data supply circuit, until oc- 
currence of said interruption, and said part of 
said group of data further including data not yet 
read from said storage device by said prefetch 
35 data read circuit among said group of data, until 

occurrence of said interruption. 



id) a prefetch data read circuit connected to J5 
said processor and said storage device and re- 
sponsive to said data prefetch request issued 
by said processor for sequentially reading said 
group of data designated by said data prefetch 
request from said storage device: 50 
fc2) a group of storage regions for holding said 
group of data: 

(c3) a prefetch data supply circuit connected to 
said processor and said prefetch data read cir- 
cuit and responsive to execution of an instruc- 55 
tton by said processor which requests refer- 
ence to one of said group of data held in said 
group of storage regions, for supplying said 
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FIG. 11 
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(57) To improve the function of a circuit for prefetch- 
ing data accessed by a processor, a prefetch unit (105) 
incorporates therein a circuit for issuing a request to 
read out one group of data to be prefetched and regis- 
ters for holding the group of data read in response to the 
read request therein. The group of data are read out 
from a cache memory (1001) or a mam memory (1 ) un- 
der the control of a cache request unit (101). A plurality 
of groups of data can be prefetched. When data desig- 
nation is made, the processor (2) requests the cache 
memory (1001 ) to read a block to which the data to be 
prefetched belongs. A circuit is also included in the 
prefetch unit (105). wherein when prefetched data is 
subsequently updated by the processor, its updated da- 
ta is made invalid. Elements of a vector complex in struc- 
ture, such as an indexed vector or the like can be also 
read out. It is also possible to cope with an interrupt gen- 
erated within the processor (2). 
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